<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
  <generator uri="http://jekyllrb.com" version="4.4.1">Jekyll</generator>
  
  
  <link href="https://arpankapoor.com/feed.xml" rel="self" type="application/atom+xml" />
  <link href="https://arpankapoor.com/" rel="alternate" type="text/html" hreflang="en" />
  <updated>2026-03-07T04:28:06+00:00</updated>
  <id>https://arpankapoor.com/</id>

  
    <title type="html">arpan kapoor</title>
  

  
    <subtitle>arpan kapoor&apos;s online home</subtitle>
  

  
    <author>
        <name>Arpan Kapoor</name>
      
      
    </author>
  

  
  
    <entry>
      
      <title type="html">sqlite caching with vmtouch</title>
      
      
      <link href="https://arpankapoor.com/sqlite" rel="alternate" type="text/html" title="sqlite caching with vmtouch" />
      
      <published>2022-11-26T00:00:00+00:00</published>
      <updated>2022-11-26T00:00:00+00:00</updated>
      <id>https://arpankapoor.com/sqlite</id>
      <content type="html" xml:base="https://arpankapoor.com/sqlite">&lt;p&gt;I have a script that queries a sqlite database file with ~8M rows in the main table and measures ~1.3GiB on disk.
With a cold start, the script takes around 6 minutes to execute:&lt;/p&gt;
&lt;div class=&quot;language-console highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;gp&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;c&quot;&gt;# clear the cache: https://www.kernel.org/doc/Documentation/sysctl/vm.txt&lt;/span&gt;
&lt;span class=&quot;gp&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;sudo sync&lt;/span&gt;
&lt;span class=&quot;gp&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;echo &lt;/span&gt;3 | &lt;span class=&quot;nb&quot;&gt;sudo tee&lt;/span&gt; /proc/sys/vm/drop_caches
&lt;span class=&quot;go&quot;&gt;
&lt;/span&gt;&lt;span class=&quot;gp&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;time&lt;/span&gt; ./target/release/app
&lt;span class=&quot;c&quot;&gt;...
...
&lt;/span&gt;&lt;span class=&quot;go&quot;&gt;
real    6m7.636s
user    0m14.322s
sys     0m15.043s
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;From &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;time&lt;/code&gt;’s output, we can infer that the script is spending most of its time on I/O.&lt;/p&gt;

&lt;p&gt;Since the db file is not large, bringing the entire file into the disk/page cache should speed things up.
To do this, I found a small utility called &lt;a href=&quot;https://hoytech.com/vmtouch/&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;vmtouch&lt;/code&gt;&lt;/a&gt;.
The &lt;a href=&quot;https://github.com/hoytech/vmtouch/blob/8f6898e3c027f445962e223ca7a7b33d40395fc6/vmtouch.c#L636-L650&quot;&gt;code&lt;/a&gt;
it uses to preload is straightforward: memory map the file and then read it with a stride of &lt;a href=&quot;https://man7.org/linux/man-pages/man3/sysconf.3.html&quot;&gt;PAGESIZE&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The total runtime (including time to cache the file) reduces to ~24 seconds giving a speed up of ~15x:&lt;/p&gt;
&lt;div class=&quot;language-console highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;gp&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;c&quot;&gt;# clear the cache&lt;/span&gt;
&lt;span class=&quot;gp&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;sudo sync&lt;/span&gt;
&lt;span class=&quot;gp&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;echo &lt;/span&gt;3 | &lt;span class=&quot;nb&quot;&gt;sudo tee&lt;/span&gt; /proc/sys/vm/drop_caches
&lt;span class=&quot;go&quot;&gt;
&lt;/span&gt;&lt;span class=&quot;gp&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;vmtouch &lt;span class=&quot;nt&quot;&gt;-vt&lt;/span&gt; db.sqlite
&lt;span class=&quot;go&quot;&gt;[OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO] 319313/319313

           Files: 1
     Directories: 0
   Touched Pages: 319313 (1G)
         Elapsed: 9.6319 seconds

real    0m9.674s
user    0m0.084s
sys     0m0.569s
&lt;/span&gt;&lt;span class=&quot;gp&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;time&lt;/span&gt; ./target/release/app
&lt;span class=&quot;c&quot;&gt;...
...
&lt;/span&gt;&lt;span class=&quot;go&quot;&gt;
real    0m14.051s
user    0m7.056s
sys     0m6.796s
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;</content>

      
      
      
      
      

      
        <author>
            <name>Arpan Kapoor</name>
          
          
        </author>
      

      

      

      
        <summary type="html">I have a script that queries a sqlite database file with ~8M rows in the main table and measures ~1.3GiB on disk. With a cold start, the script takes around 6 minutes to execute: $ # clear the cache: https://www.kernel.org/doc/Documentation/sysctl/vm.txt $ sudo sync $ echo 3 | sudo tee /proc/sys/vm/drop_caches $ time ./target/release/app ... ... real 6m7.636s user 0m14.322s sys 0m15.043s From time’s output, we can infer that the script is spending most of its time on I/O. Since the db file is not large, bringing the entire file into the disk/page cache should speed things up. To do this, I found a small utility called vmtouch. The code it uses to preload is straightforward: memory map the file and then read it with a stride of PAGESIZE. The total runtime (including time to cache the file) reduces to ~24 seconds giving a speed up of ~15x: $ # clear the cache $ sudo sync $ echo 3 | sudo tee /proc/sys/vm/drop_caches $ vmtouch -vt db.sqlite [OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO] 319313/319313 Files: 1 Directories: 0 Touched Pages: 319313 (1G) Elapsed: 9.6319 seconds real 0m9.674s user 0m0.084s sys 0m0.569s $ time ./target/release/app ... ... real 0m14.051s user 0m7.056s sys 0m6.796s</summary>
      

      
      
    </entry>
  
  
  
    <entry>
      
      <title type="html">exploring GPU programming with pycuda &amp;amp; VGG16</title>
      
      
      <link href="https://arpankapoor.com/cuda" rel="alternate" type="text/html" title="exploring GPU programming with pycuda &amp; VGG16" />
      
      <published>2022-05-29T00:00:00+00:00</published>
      <updated>2022-05-29T00:00:00+00:00</updated>
      <id>https://arpankapoor.com/cuda</id>
      <content type="html" xml:base="https://arpankapoor.com/cuda">&lt;p&gt;In this post, I try to implement an inference module for a simple
CNN model to help me get familiarized with GPU programming concepts.&lt;/p&gt;

&lt;h2 id=&quot;history&quot;&gt;history&lt;/h2&gt;

&lt;p&gt;Graphics Processing Units, as the name suggests were originally designed to render 2D and 3D graphics.
For this, they exposed APIs such as &lt;a href=&quot;https://en.wikipedia.org/wiki/OpenGL&quot;&gt;OpenGL&lt;/a&gt;/&lt;a href=&quot;https://en.wikipedia.org/wiki/Direct3D&quot;&gt;Direct3D&lt;/a&gt;
that were aimed at doing pixel/graphic math.
See &lt;a href=&quot;https://cs.lmu.edu/~ray/notes/openglexamples/&quot;&gt;this&lt;/a&gt; for examples.&lt;/p&gt;

&lt;h2 id=&quot;architecture&quot;&gt;architecture&lt;/h2&gt;

&lt;p&gt;Compared to a CPU, a GPU consists of many more cores (&lt;a href=&quot;https://www.nvidia.com/en-us/geforce/graphics-cards/30-series/rtx-3080-3080ti/&quot;&gt;NVIDIA 3080 Ti&lt;/a&gt; has 10240 cores),
but each of them is slower than a typical CPU core (3080 runs at 1.37 GHz).&lt;/p&gt;

&lt;p&gt;This makes them suitable for &lt;a href=&quot;https://en.wikipedia.org/wiki/Data_parallelism&quot;&gt;data parallel&lt;/a&gt;
tasks where you want to execute the same operation on large amounts of data in parallel (&lt;a href=&quot;https://en.wikipedia.org/wiki/Single_instruction,_multiple_threads&quot;&gt;SIMT&lt;/a&gt; instead of &lt;a href=&quot;https://en.wikipedia.org/wiki/Single_instruction,_multiple_data&quot;&gt;SIMD&lt;/a&gt;).
In contrast, CPUs excel at fast sequential execution of instructions, which is suitable for
applications that need low latency rather than high throughput.&lt;/p&gt;

&lt;p&gt;The below diagram depicts the hardware architecture of an older NVIDIA GPU:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;the GPU has a separate DRAM from the CPU’s DRAM&lt;/li&gt;
  &lt;li&gt;the GPU is divided into Streaming Multiprocessors (3080 has 80 SMs)&lt;/li&gt;
  &lt;li&gt;each SM contains a number of cores that perform the actual computation (each SM in 3080 has 128 cores)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;img src=&quot;/assets/gpu.jpg&quot; alt=&quot;&quot; /&gt;
&lt;em&gt;NVIDIA Fermi micro-architecture (&lt;a href=&quot;https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0061892&quot;&gt;src&lt;/a&gt;)&lt;/em&gt;&lt;/p&gt;

&lt;h2 id=&quot;cuda&quot;&gt;CUDA&lt;/h2&gt;

&lt;p&gt;Scientists/tinkerers started using GPUs for non-graphics problems that
fit the SIMT paradigm, such as linear algebra (eventually ML/deep learning).
A major issue with this was that they had to shoehorn their solutions to the graphics-oriented APIs exposed by GPUs.&lt;/p&gt;

&lt;p&gt;In ~2006, NVIDIA comes up with CUDA, opening the doors to &lt;a href=&quot;https://en.wikipedia.org/wiki/General-purpose_computing_on_graphics_processing_units&quot;&gt;general purpose computing on GPUs&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;CUDA mainly provides the ability to program the GPU using a set of extensions to C/C++, which are compiled
down to &lt;a href=&quot;https://docs.nvidia.com/cuda/parallel-thread-execution/index.html&quot;&gt;PTX&lt;/a&gt; (NVIDIA GPU’s instruction set)
using their proprietary compiler (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;nvcc&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;It exposes the GPU hardware to the programmer as a grid of threads.
Each grid is further divided into independently scheduled blocks of threads, each of which executes on a single SM.&lt;/p&gt;

&lt;p&gt;The number of thread blocks executing a kernel function (a C/C++ function that executes on the GPU)
can be configured using the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&amp;lt;&amp;lt;&amp;lt;NoOfBlocks, NoOfThreads&amp;gt;&amp;gt;&amp;gt;&lt;/code&gt; syntax.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/gpugrid.png&quot; alt=&quot;&quot; /&gt;
&lt;em&gt;GPU grids (&lt;a href=&quot;https://cvw.cac.cornell.edu/gpu/thread&quot;&gt;original src&lt;/a&gt;)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt;: CUDA is only available on NVIDIA graphics cards. &lt;a href=&quot;https://en.wikipedia.org/wiki/OpenCL&quot;&gt;OpenCL&lt;/a&gt;
is the vendor independent equivalent, although CUDA has a larger amount of support/documentation/libraries.&lt;/p&gt;

&lt;h3 id=&quot;simple-example&quot;&gt;simple example&lt;/h3&gt;

&lt;div class=&quot;language-cuda highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c1&quot;&gt;// compile using `nvcc example.cu`&lt;/span&gt;
&lt;span class=&quot;cp&quot;&gt;#include&lt;/span&gt; &lt;span class=&quot;cpf&quot;&gt;&amp;lt;cuda.h&amp;gt;&lt;/span&gt;&lt;span class=&quot;cp&quot;&gt;
#include&lt;/span&gt; &lt;span class=&quot;cpf&quot;&gt;&amp;lt;stdio.h&amp;gt;&lt;/span&gt;&lt;span class=&quot;cp&quot;&gt;
&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;__global__&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;vecAddKernel&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;A&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;B&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;C&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;blockDim&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;blockIdx&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;threadIdx&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;C&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;A&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;B&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;];&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;

&lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;vecAdd&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;h_A&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;h_B&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;h_C&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;size&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;n&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;sizeof&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
  &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;d_A&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;d_B&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;d_C&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

  &lt;span class=&quot;n&quot;&gt;cudaMalloc&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;**&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;d_A&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;size&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;cudaMemcpy&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;d_A&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;h_A&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;size&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;cudaMemcpyHostToDevice&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;cudaMalloc&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;**&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;d_B&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;size&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;cudaMemcpy&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;d_B&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;h_B&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;size&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;cudaMemcpyHostToDevice&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;cudaMalloc&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;**&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;d_C&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;size&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

  &lt;span class=&quot;n&quot;&gt;vecAddKernel&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ceil&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;/&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;256.0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;256&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;d_A&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;d_B&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;d_C&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

  &lt;span class=&quot;n&quot;&gt;cudaMemcpy&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;h_C&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;d_C&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;size&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;cudaMemcpyDeviceToHost&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

  &lt;span class=&quot;n&quot;&gt;cudaFree&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;d_A&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;cudaFree&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;d_B&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;cudaFree&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;d_C&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;

&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;main&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;4&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;};&lt;/span&gt;
  &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;4&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;};&lt;/span&gt;
  &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;};&lt;/span&gt;

  &lt;span class=&quot;n&quot;&gt;vecAdd&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;4&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;4&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;++&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;printf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;%f &quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]);&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;printf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\n&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;em&gt;example from &lt;a href=&quot;https://www.sciencedirect.com/book/9780128119860/programming-massively-parallel-processors&quot;&gt;Programming Massively Parallel Processors&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;vecAdd&lt;/code&gt; above is a normal C/C++ function that:&lt;/p&gt;
&lt;ol&gt;
  &lt;li&gt;allocates memory on the GPU (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;cudaMalloc&lt;/code&gt;)&lt;/li&gt;
  &lt;li&gt;copies the required data from the CPU DRAM (called host memory in CUDA terminology) to the GPU DRAM (device memory) using &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;cudaMemcpy&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;calls the kernel function with a set number of thread blocks of equal size&lt;/li&gt;
  &lt;li&gt;copies the computed results back to the CPU DRAM (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;cudaMemcpy&lt;/code&gt;)&lt;/li&gt;
  &lt;li&gt;frees GPU DRAM (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;cudaFree&lt;/code&gt;)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;vecAddKernel&lt;/code&gt; is a kernel function (indicated by &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;__global__&lt;/code&gt;) that adds vectors &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;A&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;B&lt;/code&gt; and
stores them in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;C&lt;/code&gt;.
This kernel gets executed by &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ceil(n / 256.0)&lt;/code&gt; blocks, each consisting of 256 threads.
Since each thread is called with the same arguments, it uses some of the CUDA built-in variables
(that provide the position of the thread in its grid) to choose the index that it should operate on.
Finally, the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;if&lt;/code&gt; condition prevents out of bounds access in case the data size is not evenly
divisible by the threads in each block (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;blockDim&lt;/code&gt;).&lt;/p&gt;

&lt;h2 id=&quot;vgg16&quot;&gt;VGG16&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://arxiv.org/pdf/1409.1556v6.pdf&quot;&gt;VGGNet&lt;/a&gt; is a simple CNN (Computational Neural Network) architecture,
put forward by the Visual Geometry Group at Oxford in 2014.
The version with 16 weight layers (13 convolution + 3 fully connected layers), called VGG16 is available
pretrained (on &lt;a href=&quot;https://www.image-net.org/&quot;&gt;ImageNet&lt;/a&gt;) from &lt;a href=&quot;https://storage.googleapis.com/tensorflow/keras-applications/vgg16/vgg16_weights_tf_dim_ordering_tf_kernels.h5&quot;&gt;Tensorflow&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;I first wrote a simple python version of the VGG16 inference to make sure that my output matches with that of tensorflow.&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;numpy&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;np&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tensorflow&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tf&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;PIL&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Image&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;# from https://github.com/keras-team/keras/blob/07e13740fd181fc3ddec7d9a594d8a08666645f6/keras/applications/imagenet_utils.py#L168-L238
&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;preprocess_img&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;img&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;img&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;astype&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;np&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;float32&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;c1&quot;&gt;# &apos;RGB&apos;-&amp;gt;&apos;BGR&apos; (because of opencv?)
&lt;/span&gt;    &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[...,&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;mean&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;103.939&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;116.779&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;123.68&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[...,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;mean&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[...,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;mean&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[...,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;mean&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;# read sample image using PIL and resize to the size required by VGG16
&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;img&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;np&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;asarray&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Image&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;open&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;images/n02504458_12101.jpeg&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;resize&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;224&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;224&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)))&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;# preprocess input image as done by tensorflow
&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;img&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;preprocess_img&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;img&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;# create tensorflow model
&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;model&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;keras&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;applications&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;vgg16&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;VGG16&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;# get class probabilities
&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tfoutput&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;model&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;predict&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;img&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;np&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;newaxis&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;...])[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;To get the class labels corresponding to the highest probability outputs from the model, I saved
the class number to label mapping for ImageNet into a dictionary called &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;IMAGENET_LABELS&lt;/code&gt; in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;labels.py&lt;/code&gt; and
imported it.&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c1&quot;&gt;# from https://gist.github.com/yrevar/942d3a0ac09ec9e5eb3a#file-imagenet1000_clsidx_to_labels-txt
&lt;/span&gt;&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;labels&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;IMAGENET_LABELS&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;# return top predictions with probabilities from output of last layer
&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;get_top_predictions&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;preds&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;top&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;IMAGENET_LABELS&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;preds&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;preds&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;argsort&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()[:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;top&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;get_top_predictions&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tfoutput&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;fur coat&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;0.98191327&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
 &lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;Labrador retriever&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;0.006128772&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
 &lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;Eskimo dog, husky&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;0.0019237188&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
 &lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;golden retriever&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;0.0015210236&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
 &lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;pug, pug-dog&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;0.0012324217&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 id=&quot;python-implementation&quot;&gt;python implementation&lt;/h3&gt;

&lt;p&gt;The first step for the numpy implementation is to get the pretrained weights.
The tensorflow model weights are available as a &lt;a href=&quot;https://www.hdfgroup.org/solutions/hdf5/&quot;&gt;hdf5&lt;/a&gt; file,
which is a dictionary-like object with the layer names as the key and the weights and biases
as the value for that layer.&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;h5py&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;# from https://storage.googleapis.com/tensorflow/keras-applications/vgg16/vgg16_weights_tf_dim_ordering_tf_kernels.h5
&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;VGG_WTS&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;h5py&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;File&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;vgg16_weights_tf_dim_ordering_tf_kernels.h5&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;r&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;# layer names from https://github.com/keras-team/keras/blob/v2.9.0/keras/applications/vgg16.py#L43-L227
# can also be obtained by executing `model.summary()` on the `Model` object
&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;VGG_LAYERS&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;block1_conv1&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;block1_conv2&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;block1_pool&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
              &lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;block2_conv1&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;block2_conv2&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;block2_pool&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
              &lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;block3_conv1&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;block3_conv2&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;block3_conv3&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;block3_pool&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
              &lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;block4_conv1&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;block4_conv2&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;block4_conv3&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;block4_pool&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
              &lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;block5_conv1&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;block5_conv2&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;block5_conv3&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;block5_pool&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
              &lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;flatten&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
              &lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;fc1&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;fc2&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
              &lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;predictions&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Next, we write the activation functions: &lt;a href=&quot;https://en.wikipedia.org/wiki/Rectifier_(neural_networks)&quot;&gt;ReLU&lt;/a&gt; for the
hidden layers and &lt;a href=&quot;https://en.wikipedia.org/wiki/Softmax_function&quot;&gt;softmax&lt;/a&gt; for the output layer:&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;relu&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;np&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;maximum&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;softmax&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;exp&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;np&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;exp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;exp&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;/&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;np&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;sum&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;exp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Now, we can proceed to implement the 2D convolution operation.
The inputs required for this will be the layer weights, biases and the output from the previous layer.
Before proceeding, lets check the shapes of these inputs for the first layer.&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c1&quot;&gt;# for the first layer, input is the image
&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;img&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;shape&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;224&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;224&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;# weights and biases for first CONV layer
&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;w&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;b&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;VGG_WTS&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;VGG_LAYERS&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]].&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;values&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;# the first layer of VGG16 is:
# x = layers.Conv2D(64, (3, 3), activation=&apos;relu&apos;, padding=&apos;same&apos;, name=&apos;block1_conv1&apos;)
# 64 kernels of size 3x3 with padding to make sure output has same size as input
&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;# 64 3x3 kernels (1 per input channel: corresponds to the 3rd 3 below)
&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;w&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;shape&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;64&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;# 1 bias per kernel
&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;shape&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;64&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;&lt;img src=&quot;/assets/conv.png&quot; alt=&quot;&quot; /&gt;
&lt;em&gt;Convolution with 1 kernel (&lt;a href=&quot;https://towardsdatascience.com/types-of-convolution-kernels-simplified-f040cb307c37&quot;&gt;src&lt;/a&gt;)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;In the above image, the blue cuboid corresponds to an output from one of the layers (feature map), which
is being convolved with a single kernel (orange cuboid) to form 1 channel of the output.&lt;/p&gt;

&lt;p&gt;The implementation follows:&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;applyConv2d&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;w&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;inp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
    &lt;span class=&quot;sh&quot;&gt;&quot;&quot;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;assuming odd-sized square kernel&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&quot;&quot;&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;ksize&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;w&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;shape&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;out_channels&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;w&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;shape&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;

    &lt;span class=&quot;c1&quot;&gt;# pad the input, not output: https://stackoverflow.com/a/69544897
&lt;/span&gt;    &lt;span class=&quot;n&quot;&gt;padded_inp&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;np&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;pad&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;inp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ksize&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;//&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ksize&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;//&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
                              &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ksize&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;//&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ksize&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;//&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
                              &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)))&lt;/span&gt;

    &lt;span class=&quot;c1&quot;&gt;# output is same shape as input with more channels
&lt;/span&gt;    &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;np&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;zeros&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;inp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;shape&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[:&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;out_channels&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;dtype&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;np&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;float32&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

    &lt;span class=&quot;c1&quot;&gt;# convolve the surrounding of each pixel (3x3x3) with the kernel
&lt;/span&gt;    &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;range&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;inp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;shape&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]):&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;y&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;range&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;inp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;shape&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]):&lt;/span&gt;
            &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;c&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;range&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;out_channels&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
                &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;][&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;y&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;][&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;np&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;tensordot&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;w&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[...,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;],&lt;/span&gt;
                                            &lt;span class=&quot;n&quot;&gt;padded_inp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ksize&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;y&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;y&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ksize&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;],&lt;/span&gt;
                                            &lt;span class=&quot;n&quot;&gt;ksize&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

    &lt;span class=&quot;c1&quot;&gt;# add bias to each output channel
&lt;/span&gt;    &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;c&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;range&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;out_channels&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[...,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;

    &lt;span class=&quot;c1&quot;&gt;# apply relu activation
&lt;/span&gt;    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;relu&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Each convolutional block of VGG16 ends with a max pooling layer, which is used to reduce the dimensions of
the intermediate feature maps.
The VGG16 max pooling is quite simple to implement since it uses a 2x2 kernel with a stride of 2, which reduces both the
height and width of the feature map to half its original value.
For instance, in the first block, max pooling reduces the feature map from (224, 224, 64) to (112, 112, 64).&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;applyMaxPool2d&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;inp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
    &lt;span class=&quot;sh&quot;&gt;&quot;&quot;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;2x2 kernel with (2,2) stride&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&quot;&quot;&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;np&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;zeros&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;inp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;shape&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;//&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;inp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;shape&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;//&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;inp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;shape&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;dtype&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;np&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;float32&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;range&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;shape&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]):&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;y&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;range&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;shape&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]):&lt;/span&gt;
            &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;c&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;range&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;inp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;shape&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]):&lt;/span&gt;
                &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;][&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;y&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;][&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;np&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;max&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;inp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;y&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;y&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;])&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Lastly, we write a function to apply all layers of VGG16 to the input image:&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;applyVgg16&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;inp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;curr&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;inp&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;outputs&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[]&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;layer&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;VGG_LAYERS&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;pool&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;layer&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;applyMaxPool2d&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;curr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;elif&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;layer&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;flatten&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;curr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;flatten&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
        &lt;span class=&quot;c1&quot;&gt;# weight layers
&lt;/span&gt;        &lt;span class=&quot;k&quot;&gt;else&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;w&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;b&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;np&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;array&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;VGG_WTS&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;layer&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;].&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;values&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;())&lt;/span&gt;
            &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;layer&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;startswith&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;fc&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;or&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;layer&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;predictions&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
                &lt;span class=&quot;c1&quot;&gt;# fully connected layers are a simple matrix multiplication
&lt;/span&gt;                &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;np&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;matmul&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;w&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;T&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;curr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;reshape&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)).&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;flatten&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;
                &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;relu&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;layer&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;startswith&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;fc&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;else&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;softmax&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
            &lt;span class=&quot;k&quot;&gt;else&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
                &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;applyConv2d&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;w&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;curr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;outputs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;append&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;nf&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;sa&quot;&gt;f&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;processed &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;layer&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;: inshape: &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;curr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;shape&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;, outshape: &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;shape&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;curr&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;

    &lt;span class=&quot;c1&quot;&gt;# return output of all hidden layers along with output layer
&lt;/span&gt;    &lt;span class=&quot;c1&quot;&gt;# helps in inspecting output of hidden layers
&lt;/span&gt;    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;outputs&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Now we call the above function on the preprocessed image and print the predictions:&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;start&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;datetime&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;datetime&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;now&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;outputs&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;applyVgg16&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;img&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;end&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;datetime&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;datetime&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;now&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;

&lt;span class=&quot;nf&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;predictions: &lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;get_top_predictions&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;outputs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]))&lt;/span&gt;
&lt;span class=&quot;nf&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;sa&quot;&gt;f&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;in &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;end&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;start&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;language-console highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;gp&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;./numpy_vgg16.py
&lt;span class=&quot;go&quot;&gt;processed block1_conv1: inshape: (224, 224, 3), outshape: (224, 224, 64)
processed block1_conv2: inshape: (224, 224, 64), outshape: (224, 224, 64)
processed block1_pool: inshape: (224, 224, 64), outshape: (112, 112, 64)
processed block2_conv1: inshape: (112, 112, 64), outshape: (112, 112, 128)
processed block2_conv2: inshape: (112, 112, 128), outshape: (112, 112, 128)
processed block2_pool: inshape: (112, 112, 128), outshape: (56, 56, 128)
processed block3_conv1: inshape: (56, 56, 128), outshape: (56, 56, 256)
processed block3_conv2: inshape: (56, 56, 256), outshape: (56, 56, 256)
processed block3_conv3: inshape: (56, 56, 256), outshape: (56, 56, 256)
processed block3_pool: inshape: (56, 56, 256), outshape: (28, 28, 256)
processed block4_conv1: inshape: (28, 28, 256), outshape: (28, 28, 512)
processed block4_conv2: inshape: (28, 28, 512), outshape: (28, 28, 512)
processed block4_conv3: inshape: (28, 28, 512), outshape: (28, 28, 512)
processed block4_pool: inshape: (28, 28, 512), outshape: (14, 14, 512)
processed block5_conv1: inshape: (14, 14, 512), outshape: (14, 14, 512)
processed block5_conv2: inshape: (14, 14, 512), outshape: (14, 14, 512)
processed block5_conv3: inshape: (14, 14, 512), outshape: (14, 14, 512)
processed block5_pool: inshape: (14, 14, 512), outshape: (7, 7, 512)
processed flatten: inshape: (7, 7, 512), outshape: (25088,)
processed fc1: inshape: (25088,), outshape: (4096,)
processed fc2: inshape: (4096,), outshape: (4096,)
processed predictions: inshape: (4096,), outshape: (1000,)
predictions:  {&apos;African elephant, Loxodonta africana&apos;: 0.9346673, &apos;tusker&apos;: 0.06419284, &apos;Indian elephant, Elephas maximus&apos;: 0.0011378175, &apos;warthog&apos;: 6.19383e-07, &apos;water buffalo, water ox, Asiatic buffalo, Bubalus bubalis&apos;: 3.1221526e-07}
computed in 0:02:46.589266
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;So, our pure numpy VGG16 inference implementation takes around 3 minutes to process 1 image.&lt;/p&gt;

&lt;p&gt;Now lets see how much this could be sped up with a naive CUDA implementation of the slow operations (convolution).&lt;/p&gt;

&lt;h3 id=&quot;speed-up-using-cuda&quot;&gt;speed up using CUDA&lt;/h3&gt;

&lt;p&gt;The first task to use CUDA is to divide the problem into (gridDimension, blockDimension) sized sub-problems.&lt;/p&gt;

&lt;p&gt;Since the number of threads in a block is limited to 1024 and the number of output channels of any layer in
VGG16 does not exceed 512, we can set the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;blockDim&lt;/code&gt; to the output channel count for each layer.&lt;/p&gt;

&lt;p&gt;The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;gridDim&lt;/code&gt; can be set to the image/feature map (height, width).&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pycuda.gpuarray&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pycuda.driver&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;cuda&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pycuda.autoinit&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;applyConv2dCuda&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;w&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;inp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;ksize&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;w&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;shape&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;inp_channels&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;inp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;shape&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;out_channels&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;w&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;shape&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;padded_inp&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;np&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;pad&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;inp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ksize&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;//&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ksize&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;//&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
                              &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ksize&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;//&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ksize&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;//&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
                              &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)))&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;np&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;zeros&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;inp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;shape&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[:&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;out_channels&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;dtype&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;np&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;float32&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

    &lt;span class=&quot;c1&quot;&gt;# convert numpy arrays to pycuda.gpuarray
&lt;/span&gt;    &lt;span class=&quot;n&quot;&gt;in_gpu&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;out_gpu&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pycuda&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;gpuarray&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;to_gpu&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;padded_inp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pycuda&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;gpuarray&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;to_gpu&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;w_gpu&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;b_gpu&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pycuda&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;gpuarray&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;to_gpu&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;w&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pycuda&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;gpuarray&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;to_gpu&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

    &lt;span class=&quot;c1&quot;&gt;# call cuda implementation with the GPU arrays
&lt;/span&gt;    &lt;span class=&quot;nf&quot;&gt;cudaConv2d&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;in_gpu&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;out_gpu&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;w_gpu&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;b_gpu&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;np&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;int32&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;inp_channels&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;np&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;int32&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ksize&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
               &lt;span class=&quot;n&quot;&gt;block&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;out_channels&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;grid&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;inp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;shape&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[:&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;])&lt;/span&gt;

    &lt;span class=&quot;c1&quot;&gt;# copy back the output from GPU memory
&lt;/span&gt;    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;out_gpu&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;get&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The setup code to call the cuda function &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;cudaConv2d&lt;/code&gt; is similar to the example we looked at earlier.
In this case, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;pycuda&lt;/code&gt; provides us an easy way to copy numpy ndarray-like data from the CPU to the GPU using its
&lt;a href=&quot;https://documen.tician.de/pycuda/array.html#pycuda.gpuarray.GPUArray&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;GPUArray&lt;/code&gt;&lt;/a&gt; class.&lt;/p&gt;

&lt;p&gt;Now, we write our CUDA code for convolution.
The below kernel calculates the value of 1 pixel in the output feature map of a layer.&lt;/p&gt;

&lt;p&gt;The most convoluted part here is the calculation of the indices into the various input arrays, since all
of them are multi-dimensional, but represented by a single-dimensional array in C++.&lt;/p&gt;

&lt;p&gt;I encountered an error &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;cuMemFree failed: an illegal memory access was encountered&lt;/code&gt;, which was likely due
to out of bounds access into the input array. Fortunately, NVIDIA kernels support &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;printf&lt;/code&gt;, which made debugging
these issues easier.&lt;/p&gt;

&lt;div class=&quot;language-cuda highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;__global__&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;conv2d&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;const&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;inp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;const&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;w&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                       &lt;span class=&quot;k&quot;&gt;const&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;inp_channels&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ksize&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;channel_num&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;threadIdx&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;out_channels&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;blockDim&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

  &lt;span class=&quot;c1&quot;&gt;// pixel (x, y)&lt;/span&gt;
  &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;blockIdx&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;y&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;blockIdx&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;y&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

  &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;outidx&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;out_channels&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;gridDim&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;y&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;
               &lt;span class=&quot;n&quot;&gt;out_channels&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;y&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;
               &lt;span class=&quot;n&quot;&gt;channel_num&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

  &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ksize&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;++&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;j&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;j&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ksize&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;j&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;++&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
      &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;k&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;k&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;inp_channels&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;k&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;++&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;

        &lt;span class=&quot;c1&quot;&gt;// w is 4D with dimensions: (ksize, ksize, input_channels, output_channels)&lt;/span&gt;
        &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;widx&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ksize&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;inp_channels&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;out_channels&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;
                   &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;inp_channels&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;out_channels&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;j&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;
                   &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;out_channels&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;k&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;
                   &lt;span class=&quot;n&quot;&gt;channel_num&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

        &lt;span class=&quot;c1&quot;&gt;// inp is 3D with dimensions: (blockDim.x + padding, blockDim.y + padding, input_channels)&lt;/span&gt;
        &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;inpidx&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;gridDim&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;y&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ksize&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;inp_channels&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;
                     &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;inp_channels&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;j&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;y&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;
                     &lt;span class=&quot;n&quot;&gt;k&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

        &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;outidx&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;inp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;inpidx&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;w&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;widx&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;];&lt;/span&gt;
      &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;

  &lt;span class=&quot;c1&quot;&gt;// add bias&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;outidx&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;channel_num&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;];&lt;/span&gt;

  &lt;span class=&quot;c1&quot;&gt;// relu&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;outidx&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;outidx&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;0.0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;To be able to compile and run this CUDA code at runtime, we provide the above code as a python string to
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;pycuda&lt;/code&gt;’s &lt;a href=&quot;https://documen.tician.de/pycuda/driver.html?highlight=sourcemodule#pycuda.compiler.SourceModule&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SourceModule&lt;/code&gt;&lt;/a&gt;
class&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pycuda.compiler&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;SourceModule&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;src&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;sh&quot;&gt;&quot;&quot;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;...placeholder for CUDA code...&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&quot;&quot;&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;mod&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;SourceModule&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;src&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;cudaConv2d&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;mod&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;get_function&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;conv2d&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Finally, after executing this, we obtain the following output which is the same as the one from tensorflow
and our numpy implementation.
The pycuda implementation however completes in ~5 seconds which is ~30x faster than our CPU based numpy implementation.&lt;/p&gt;

&lt;p&gt;Tensorflow on the other hand is even faster than our implementation at ~1 second.
This would most likely be because it internally uses the &lt;a href=&quot;https://developer.nvidia.com/cudnn&quot;&gt;cuDNN&lt;/a&gt;
library, which is highly optimized for deep neural network operations.&lt;/p&gt;

&lt;div class=&quot;language-console highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;gp&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;./pycuda_vgg16.py
&lt;span class=&quot;go&quot;&gt;processed block1_conv1: inshape: (224, 224, 3), outshape: (224, 224, 64)
processed block1_conv2: inshape: (224, 224, 64), outshape: (224, 224, 64)
processed block1_pool: inshape: (224, 224, 64), outshape: (112, 112, 64)
processed block2_conv1: inshape: (112, 112, 64), outshape: (112, 112, 128)
processed block2_conv2: inshape: (112, 112, 128), outshape: (112, 112, 128)
processed block2_pool: inshape: (112, 112, 128), outshape: (56, 56, 128)
processed block3_conv1: inshape: (56, 56, 128), outshape: (56, 56, 256)
processed block3_conv2: inshape: (56, 56, 256), outshape: (56, 56, 256)
processed block3_conv3: inshape: (56, 56, 256), outshape: (56, 56, 256)
processed block3_pool: inshape: (56, 56, 256), outshape: (28, 28, 256)
processed block4_conv1: inshape: (28, 28, 256), outshape: (28, 28, 512)
processed block4_conv2: inshape: (28, 28, 512), outshape: (28, 28, 512)
processed block4_conv3: inshape: (28, 28, 512), outshape: (28, 28, 512)
processed block4_pool: inshape: (28, 28, 512), outshape: (14, 14, 512)
processed block5_conv1: inshape: (14, 14, 512), outshape: (14, 14, 512)
processed block5_conv2: inshape: (14, 14, 512), outshape: (14, 14, 512)
processed block5_conv3: inshape: (14, 14, 512), outshape: (14, 14, 512)
processed block5_pool: inshape: (14, 14, 512), outshape: (7, 7, 512)
processed flatten: inshape: (7, 7, 512), outshape: (25088,)
processed fc1: inshape: (25088,), outshape: (4096,)
processed fc2: inshape: (4096,), outshape: (4096,)
processed predictions: inshape: (4096,), outshape: (1000,)
predictions:  {&apos;African elephant, Loxodonta africana&apos;: 0.9346673, &apos;tusker&apos;: 0.06419284, &apos;Indian elephant, Elephas maximus&apos;: 0.0011378153, &apos;warthog&apos;: 6.193812e-07, &apos;water buffalo, water ox, Asiatic buffalo, Bubalus bubalis&apos;: 3.1221467e-07}
computed in 0:00:04.784639
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;To optimize further, we could also write the max pooling algorithm using CUDA, but the speed up won’t be as pronounced
as the conv2d case.&lt;/p&gt;

&lt;p&gt;Hopefully this simple implementation helps to demystify how deep learning/ML libraries get such incredible
speedups when using GPUs (or TPUs on Google Colab)!&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt;: execution was done on a Intel i9-9900X processor with a NVIDIA RTX 2060 GPU.
code is available on &lt;a href=&quot;https://github.com/arpankapoor/pycuda-vgg16&quot;&gt;github&lt;/a&gt;.&lt;/p&gt;</content>

      
      
      
      
      

      
        <author>
            <name>Arpan Kapoor</name>
          
          
        </author>
      

      

      

      
        <summary type="html">In this post, I try to implement an inference module for a simple CNN model to help me get familiarized with GPU programming concepts. history Graphics Processing Units, as the name suggests were originally designed to render 2D and 3D graphics. For this, they exposed APIs such as OpenGL/Direct3D that were aimed at doing pixel/graphic math. See this for examples. architecture Compared to a CPU, a GPU consists of many more cores (NVIDIA 3080 Ti has 10240 cores), but each of them is slower than a typical CPU core (3080 runs at 1.37 GHz). This makes them suitable for data parallel tasks where you want to execute the same operation on large amounts of data in parallel (SIMT instead of SIMD). In contrast, CPUs excel at fast sequential execution of instructions, which is suitable for applications that need low latency rather than high throughput. The below diagram depicts the hardware architecture of an older NVIDIA GPU: the GPU has a separate DRAM from the CPU’s DRAM the GPU is divided into Streaming Multiprocessors (3080 has 80 SMs) each SM contains a number of cores that perform the actual computation (each SM in 3080 has 128 cores) NVIDIA Fermi micro-architecture (src) CUDA Scientists/tinkerers started using GPUs for non-graphics problems that fit the SIMT paradigm, such as linear algebra (eventually ML/deep learning). A major issue with this was that they had to shoehorn their solutions to the graphics-oriented APIs exposed by GPUs. In ~2006, NVIDIA comes up with CUDA, opening the doors to general purpose computing on GPUs. CUDA mainly provides the ability to program the GPU using a set of extensions to C/C++, which are compiled down to PTX (NVIDIA GPU’s instruction set) using their proprietary compiler (nvcc). It exposes the GPU hardware to the programmer as a grid of threads. Each grid is further divided into independently scheduled blocks of threads, each of which executes on a single SM. The number of thread blocks executing a kernel function (a C/C++ function that executes on the GPU) can be configured using the &amp;lt;&amp;lt;&amp;lt;NoOfBlocks, NoOfThreads&amp;gt;&amp;gt;&amp;gt; syntax. GPU grids (original src) Note: CUDA is only available on NVIDIA graphics cards. OpenCL is the vendor independent equivalent, although CUDA has a larger amount of support/documentation/libraries. simple example // compile using `nvcc example.cu` #include &amp;lt;cuda.h&amp;gt; #include &amp;lt;stdio.h&amp;gt; __global__ void vecAddKernel(float *A, float *B, float *C, int n) { int i = blockDim.x * blockIdx.x + threadIdx.x; if (i &amp;lt; n) C[i] = A[i] + B[i]; } void vecAdd(float *h_A, float *h_B, float *h_C, int n) { int size = n * sizeof(float); float *d_A, *d_B, *d_C; cudaMalloc((void **)&amp;amp;d_A, size); cudaMemcpy(d_A, h_A, size, cudaMemcpyHostToDevice); cudaMalloc((void **)&amp;amp;d_B, size); cudaMemcpy(d_B, h_B, size, cudaMemcpyHostToDevice); cudaMalloc((void **)&amp;amp;d_C, size); vecAddKernel&amp;lt;&amp;lt;&amp;lt;ceil(n / 256.0), 256&amp;gt;&amp;gt;&amp;gt;(d_A, d_B, d_C, n); cudaMemcpy(h_C, d_C, size, cudaMemcpyDeviceToHost); cudaFree(d_A); cudaFree(d_B); cudaFree(d_C); } int main() { float a[] = {1, 2, 3, 4}; float b[] = {1, 2, 3, 4}; float c[] = {0, 0, 0, 0}; vecAdd(a, b, c, 4); for (int i = 0; i &amp;lt; 4; i++) { printf(&quot;%f &quot;, c[i]); } printf(&quot;\n&quot;); } example from Programming Massively Parallel Processors vecAdd above is a normal C/C++ function that: allocates memory on the GPU (cudaMalloc) copies the required data from the CPU DRAM (called host memory in CUDA terminology) to the GPU DRAM (device memory) using cudaMemcpy calls the kernel function with a set number of thread blocks of equal size copies the computed results back to the CPU DRAM (cudaMemcpy) frees GPU DRAM (cudaFree) vecAddKernel is a kernel function (indicated by __global__) that adds vectors A, B and stores them in C. This kernel gets executed by ceil(n / 256.0) blocks, each consisting of 256 threads. Since each thread is called with the same arguments, it uses some of the CUDA built-in variables (that provide the position of the thread in its grid) to choose the index that it should operate on. Finally, the if condition prevents out of bounds access in case the data size is not evenly divisible by the threads in each block (blockDim). VGG16 VGGNet is a simple CNN (Computational Neural Network) architecture, put forward by the Visual Geometry Group at Oxford in 2014. The version with 16 weight layers (13 convolution + 3 fully connected layers), called VGG16 is available pretrained (on ImageNet) from Tensorflow. I first wrote a simple python version of the VGG16 inference to make sure that my output matches with that of tensorflow. import numpy as np import tensorflow as tf from PIL import Image # from https://github.com/keras-team/keras/blob/07e13740fd181fc3ddec7d9a594d8a08666645f6/keras/applications/imagenet_utils.py#L168-L238 def preprocess_img(img): x = img.astype(np.float32) # &apos;RGB&apos;-&amp;gt;&apos;BGR&apos; (because of opencv?) x = x[..., ::-1] mean = [103.939, 116.779, 123.68] x[..., 0] -= mean[0] x[..., 1] -= mean[1] x[..., 2] -= mean[2] return x # read sample image using PIL and resize to the size required by VGG16 img = np.asarray(Image.open(&apos;images/n02504458_12101.jpeg&apos;).resize((224, 224))) # preprocess input image as done by tensorflow img = preprocess_img(img) # create tensorflow model model = tf.keras.applications.vgg16.VGG16() # get class probabilities tfoutput = model.predict(img[np.newaxis, ...])[0] To get the class labels corresponding to the highest probability outputs from the model, I saved the class number to label mapping for ImageNet into a dictionary called IMAGENET_LABELS in labels.py and imported it. # from https://gist.github.com/yrevar/942d3a0ac09ec9e5eb3a#file-imagenet1000_clsidx_to_labels-txt from labels import IMAGENET_LABELS # return top predictions with probabilities from output of last layer def get_top_predictions(preds, top=5): return {IMAGENET_LABELS[x]: preds[x] for x in (-preds).argsort()[:top]} &amp;gt;&amp;gt;&amp;gt; get_top_predictions(tfoutput) {&apos;fur coat&apos;: 0.98191327, &apos;Labrador retriever&apos;: 0.006128772, &apos;Eskimo dog, husky&apos;: 0.0019237188, &apos;golden retriever&apos;: 0.0015210236, &apos;pug, pug-dog&apos;: 0.0012324217} python implementation The first step for the numpy implementation is to get the pretrained weights. The tensorflow model weights are available as a hdf5 file, which is a dictionary-like object with the layer names as the key and the weights and biases as the value for that layer. import h5py # from https://storage.googleapis.com/tensorflow/keras-applications/vgg16/vgg16_weights_tf_dim_ordering_tf_kernels.h5 VGG_WTS = h5py.File(&quot;vgg16_weights_tf_dim_ordering_tf_kernels.h5&quot;, &quot;r&quot;) # layer names from https://github.com/keras-team/keras/blob/v2.9.0/keras/applications/vgg16.py#L43-L227 # can also be obtained by executing `model.summary()` on the `Model` object VGG_LAYERS = [&apos;block1_conv1&apos;, &apos;block1_conv2&apos;, &apos;block1_pool&apos;, &apos;block2_conv1&apos;, &apos;block2_conv2&apos;, &apos;block2_pool&apos;, &apos;block3_conv1&apos;, &apos;block3_conv2&apos;, &apos;block3_conv3&apos;, &apos;block3_pool&apos;, &apos;block4_conv1&apos;, &apos;block4_conv2&apos;, &apos;block4_conv3&apos;, &apos;block4_pool&apos;, &apos;block5_conv1&apos;, &apos;block5_conv2&apos;, &apos;block5_conv3&apos;, &apos;block5_pool&apos;, &apos;flatten&apos;, &apos;fc1&apos;, &apos;fc2&apos;, &apos;predictions&apos;] Next, we write the activation functions: ReLU for the hidden layers and softmax for the output layer: def relu(x): return np.maximum(x, 0) def softmax(x): exp = np.exp(x) return exp / np.sum(exp) Now, we can proceed to implement the 2D convolution operation. The inputs required for this will be the layer weights, biases and the output from the previous layer. Before proceeding, lets check the shapes of these inputs for the first layer. # for the first layer, input is the image &amp;gt;&amp;gt;&amp;gt; img.shape (224, 224, 3) # weights and biases for first CONV layer &amp;gt;&amp;gt;&amp;gt; w, b = VGG_WTS[VGG_LAYERS[0]].values() # the first layer of VGG16 is: # x = layers.Conv2D(64, (3, 3), activation=&apos;relu&apos;, padding=&apos;same&apos;, name=&apos;block1_conv1&apos;) # 64 kernels of size 3x3 with padding to make sure output has same size as input # 64 3x3 kernels (1 per input channel: corresponds to the 3rd 3 below) &amp;gt;&amp;gt;&amp;gt; w.shape (3, 3, 3, 64) # 1 bias per kernel &amp;gt;&amp;gt;&amp;gt; b.shape (64,) Convolution with 1 kernel (src) In the above image, the blue cuboid corresponds to an output from one of the layers (feature map), which is being convolved with a single kernel (orange cuboid) to form 1 channel of the output. The implementation follows: def applyConv2d(w, b, inp): &quot;&quot;&quot;assuming odd-sized square kernel&quot;&quot;&quot; ksize = w.shape[0] out_channels = w.shape[-1] # pad the input, not output: https://stackoverflow.com/a/69544897 padded_inp = np.pad(inp, ((ksize//2, ksize//2), (ksize//2, ksize//2), (0, 0))) # output is same shape as input with more channels out = np.zeros(inp.shape[:2] + (out_channels,), dtype=np.float32) # convolve the surrounding of each pixel (3x3x3) with the kernel for x in range(inp.shape[0]): for y in range(inp.shape[1]): for c in range(out_channels): out[x][y][c] = np.tensordot(w[..., c], padded_inp[x:x+ksize, y:y+ksize], ksize) # add bias to each output channel for c in range(out_channels): out[..., c] += b[c] # apply relu activation return relu(out) Each convolutional block of VGG16 ends with a max pooling layer, which is used to reduce the dimensions of the intermediate feature maps. The VGG16 max pooling is quite simple to implement since it uses a 2x2 kernel with a stride of 2, which reduces both the height and width of the feature map to half its original value. For instance, in the first block, max pooling reduces the feature map from (224, 224, 64) to (112, 112, 64). def applyMaxPool2d(inp): &quot;&quot;&quot;2x2 kernel with (2,2) stride&quot;&quot;&quot; out = np.zeros((inp.shape[0]//2, inp.shape[1]//2, inp.shape[2]), dtype=np.float32) for x in range(out.shape[0]): for y in range(out.shape[1]): for c in range(inp.shape[2]): out[x][y][c] = np.max(inp[2*x:2*(x+1), 2*y:2*(y+1), c]) return out Lastly, we write a function to apply all layers of VGG16 to the input image: def applyVgg16(inp): curr = inp outputs = [] for layer in VGG_LAYERS: if &quot;pool&quot; in layer: out = applyMaxPool2d(curr) elif layer == &quot;flatten&quot;: out = curr.flatten() # weight layers else: w, b = (np.array(x) for x in VGG_WTS[layer].values()) if layer.startswith(&quot;fc&quot;) or layer == &quot;predictions&quot;: # fully connected layers are a simple matrix multiplication out = np.matmul(w.T, curr.reshape(-1, 1)).flatten() + b out = relu(out) if layer.startswith(&quot;fc&quot;) else softmax(out) else: out = applyConv2d(w, b, curr) outputs.append(out) print(f&quot;processed {layer}: inshape: {curr.shape}, outshape: {out.shape}&quot;) curr = out # return output of all hidden layers along with output layer # helps in inspecting output of hidden layers return outputs Now we call the above function on the preprocessed image and print the predictions: start = datetime.datetime.now() outputs = applyVgg16(img) end = datetime.datetime.now() print(&quot;predictions: &quot;, get_top_predictions(outputs[-1])) print(f&quot;in {end-start}&quot;) $ ./numpy_vgg16.py processed block1_conv1: inshape: (224, 224, 3), outshape: (224, 224, 64) processed block1_conv2: inshape: (224, 224, 64), outshape: (224, 224, 64) processed block1_pool: inshape: (224, 224, 64), outshape: (112, 112, 64) processed block2_conv1: inshape: (112, 112, 64), outshape: (112, 112, 128) processed block2_conv2: inshape: (112, 112, 128), outshape: (112, 112, 128) processed block2_pool: inshape: (112, 112, 128), outshape: (56, 56, 128) processed block3_conv1: inshape: (56, 56, 128), outshape: (56, 56, 256) processed block3_conv2: inshape: (56, 56, 256), outshape: (56, 56, 256) processed block3_conv3: inshape: (56, 56, 256), outshape: (56, 56, 256) processed block3_pool: inshape: (56, 56, 256), outshape: (28, 28, 256) processed block4_conv1: inshape: (28, 28, 256), outshape: (28, 28, 512) processed block4_conv2: inshape: (28, 28, 512), outshape: (28, 28, 512) processed block4_conv3: inshape: (28, 28, 512), outshape: (28, 28, 512) processed block4_pool: inshape: (28, 28, 512), outshape: (14, 14, 512) processed block5_conv1: inshape: (14, 14, 512), outshape: (14, 14, 512) processed block5_conv2: inshape: (14, 14, 512), outshape: (14, 14, 512) processed block5_conv3: inshape: (14, 14, 512), outshape: (14, 14, 512) processed block5_pool: inshape: (14, 14, 512), outshape: (7, 7, 512) processed flatten: inshape: (7, 7, 512), outshape: (25088,) processed fc1: inshape: (25088,), outshape: (4096,) processed fc2: inshape: (4096,), outshape: (4096,) processed predictions: inshape: (4096,), outshape: (1000,) predictions: {&apos;African elephant, Loxodonta africana&apos;: 0.9346673, &apos;tusker&apos;: 0.06419284, &apos;Indian elephant, Elephas maximus&apos;: 0.0011378175, &apos;warthog&apos;: 6.19383e-07, &apos;water buffalo, water ox, Asiatic buffalo, Bubalus bubalis&apos;: 3.1221526e-07} computed in 0:02:46.589266 So, our pure numpy VGG16 inference implementation takes around 3 minutes to process 1 image. Now lets see how much this could be sped up with a naive CUDA implementation of the slow operations (convolution). speed up using CUDA The first task to use CUDA is to divide the problem into (gridDimension, blockDimension) sized sub-problems. Since the number of threads in a block is limited to 1024 and the number of output channels of any layer in VGG16 does not exceed 512, we can set the blockDim to the output channel count for each layer. The gridDim can be set to the image/feature map (height, width). import pycuda.gpuarray import pycuda.driver as cuda import pycuda.autoinit def applyConv2dCuda(w, b, inp): ksize = w.shape[0] inp_channels = inp.shape[-1] out_channels = w.shape[-1] padded_inp = np.pad(inp, ((ksize//2, ksize//2), (ksize//2, ksize//2), (0, 0))) out = np.zeros(inp.shape[:2] + (out_channels,), dtype=np.float32) # convert numpy arrays to pycuda.gpuarray in_gpu, out_gpu = pycuda.gpuarray.to_gpu(padded_inp), pycuda.gpuarray.to_gpu(out) w_gpu, b_gpu = pycuda.gpuarray.to_gpu(w), pycuda.gpuarray.to_gpu(b) # call cuda implementation with the GPU arrays cudaConv2d(in_gpu, out_gpu, w_gpu, b_gpu, np.int32(inp_channels), np.int32(ksize), block=(out_channels, 1, 1), grid=inp.shape[:2]) # copy back the output from GPU memory return out_gpu.get() The setup code to call the cuda function cudaConv2d is similar to the example we looked at earlier. In this case, pycuda provides us an easy way to copy numpy ndarray-like data from the CPU to the GPU using its GPUArray class. Now, we write our CUDA code for convolution. The below kernel calculates the value of 1 pixel in the output feature map of a layer. The most convoluted part here is the calculation of the indices into the various input arrays, since all of them are multi-dimensional, but represented by a single-dimensional array in C++. I encountered an error cuMemFree failed: an illegal memory access was encountered, which was likely due to out of bounds access into the input array. Fortunately, NVIDIA kernels support printf, which made debugging these issues easier. __global__ void conv2d(const float *inp, float *out, const float *w, const float *b, int inp_channels, int ksize) { int channel_num = threadIdx.x; int out_channels = blockDim.x; // pixel (x, y) int x = blockIdx.x; int y = blockIdx.y; int outidx = out_channels * gridDim.y * x + out_channels * y + channel_num; for (int i = 0; i &amp;lt; ksize; i++) { for (int j = 0; j &amp;lt; ksize; j++) { for (int k = 0; k &amp;lt; inp_channels; k++) { // w is 4D with dimensions: (ksize, ksize, input_channels, output_channels) int widx = (ksize * inp_channels * out_channels * i) + (inp_channels * out_channels * j) + (out_channels * k) + channel_num; // inp is 3D with dimensions: (blockDim.x + padding, blockDim.y + padding, input_channels) int inpidx = ((gridDim.y + ksize - 1) * inp_channels * (i + x)) + (inp_channels * (j + y)) + k; out[outidx] += inp[inpidx] * w[widx]; } } } // add bias out[outidx] += b[channel_num]; // relu if (out[outidx] &amp;lt; 0) { out[outidx] = 0.0; } } To be able to compile and run this CUDA code at runtime, we provide the above code as a python string to pycuda’s SourceModule class from pycuda.compiler import SourceModule src = &quot;&quot;&quot;...placeholder for CUDA code...&quot;&quot;&quot; mod = SourceModule(src) cudaConv2d = mod.get_function(&quot;conv2d&quot;) Finally, after executing this, we obtain the following output which is the same as the one from tensorflow and our numpy implementation. The pycuda implementation however completes in ~5 seconds which is ~30x faster than our CPU based numpy implementation. Tensorflow on the other hand is even faster than our implementation at ~1 second. This would most likely be because it internally uses the cuDNN library, which is highly optimized for deep neural network operations. $ ./pycuda_vgg16.py processed block1_conv1: inshape: (224, 224, 3), outshape: (224, 224, 64) processed block1_conv2: inshape: (224, 224, 64), outshape: (224, 224, 64) processed block1_pool: inshape: (224, 224, 64), outshape: (112, 112, 64) processed block2_conv1: inshape: (112, 112, 64), outshape: (112, 112, 128) processed block2_conv2: inshape: (112, 112, 128), outshape: (112, 112, 128) processed block2_pool: inshape: (112, 112, 128), outshape: (56, 56, 128) processed block3_conv1: inshape: (56, 56, 128), outshape: (56, 56, 256) processed block3_conv2: inshape: (56, 56, 256), outshape: (56, 56, 256) processed block3_conv3: inshape: (56, 56, 256), outshape: (56, 56, 256) processed block3_pool: inshape: (56, 56, 256), outshape: (28, 28, 256) processed block4_conv1: inshape: (28, 28, 256), outshape: (28, 28, 512) processed block4_conv2: inshape: (28, 28, 512), outshape: (28, 28, 512) processed block4_conv3: inshape: (28, 28, 512), outshape: (28, 28, 512) processed block4_pool: inshape: (28, 28, 512), outshape: (14, 14, 512) processed block5_conv1: inshape: (14, 14, 512), outshape: (14, 14, 512) processed block5_conv2: inshape: (14, 14, 512), outshape: (14, 14, 512) processed block5_conv3: inshape: (14, 14, 512), outshape: (14, 14, 512) processed block5_pool: inshape: (14, 14, 512), outshape: (7, 7, 512) processed flatten: inshape: (7, 7, 512), outshape: (25088,) processed fc1: inshape: (25088,), outshape: (4096,) processed fc2: inshape: (4096,), outshape: (4096,) processed predictions: inshape: (4096,), outshape: (1000,) predictions: {&apos;African elephant, Loxodonta africana&apos;: 0.9346673, &apos;tusker&apos;: 0.06419284, &apos;Indian elephant, Elephas maximus&apos;: 0.0011378153, &apos;warthog&apos;: 6.193812e-07, &apos;water buffalo, water ox, Asiatic buffalo, Bubalus bubalis&apos;: 3.1221467e-07} computed in 0:00:04.784639 To optimize further, we could also write the max pooling algorithm using CUDA, but the speed up won’t be as pronounced as the conv2d case. Hopefully this simple implementation helps to demystify how deep learning/ML libraries get such incredible speedups when using GPUs (or TPUs on Google Colab)! Note: execution was done on a Intel i9-9900X processor with a NVIDIA RTX 2060 GPU. code is available on github.</summary>
      

      
      
    </entry>
  
  
  
    <entry>
      
      <title type="html">understanding the k struct</title>
      
      
      <link href="https://arpankapoor.com/k" rel="alternate" type="text/html" title="understanding the k struct" />
      
      <published>2020-05-25T00:00:00+00:00</published>
      <updated>2020-05-25T00:00:00+00:00</updated>
      <id>https://arpankapoor.com/k</id>
      <content type="html" xml:base="https://arpankapoor.com/k">&lt;!--
&lt;style&gt;
table{border-collapse:collapse;border-spacing:0;}
table,th,td{border:1px solid #e8e8e8;}
&lt;/style&gt;
--&gt;

&lt;style&gt;
table,th,td{border:1px solid #e8e8e8;}
&lt;/style&gt;

&lt;div class=&quot;language-c highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;typedef&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;S&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;C&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;typedef&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;unsigned&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;G&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;typedef&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;short&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;H&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;typedef&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;I&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;typedef&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;long&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;long&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;J&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;typedef&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;E&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;typedef&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;double&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;F&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;typedef&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;V&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;typedef&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;k0&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;kt&quot;&gt;signed&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;m&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;kt&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;u&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;r&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;union&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;G&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;g&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;H&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;h&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;I&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;J&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;j&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;E&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;e&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;F&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;f&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;S&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;k0&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;k&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;J&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;G&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;G0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;];&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;};&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;};&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;K&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The above code snippet in my opinion provides a glimpse into the simplicity in design of the APL-like languages,
from &lt;a href=&quot;https://en.wikipedia.org/wiki/Kenneth_E._Iverson&quot;&gt;Ken Iverson&lt;/a&gt;’s APL in the 1960s to
&lt;a href=&quot;https://en.wikipedia.org/wiki/Arthur_Whitney_(computer_scientist)&quot;&gt;Arthur Whitney&lt;/a&gt;’s
&lt;a href=&quot;http://www.aplusdev.org/&quot;&gt;a+&lt;/a&gt;/&lt;a href=&quot;https://en.wikipedia.org/wiki/K_(programming_language)&quot;&gt;k&lt;/a&gt;/q/&lt;a href=&quot;https://en.wikipedia.org/wiki/Kdb%2B&quot;&gt;kdb+&lt;/a&gt;/&lt;a href=&quot;https://web.archive.org/web/20191202012157/https://shakti.com/database-software-history/&quot;&gt;shakti&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;k0&lt;/code&gt; as defined above (from &lt;a href=&quot;https://github.com/KxSystems/kdb/blob/master/c/c/k.h&quot;&gt;k.h&lt;/a&gt;) is the struct which underpins
all data types exposed by
&lt;a href=&quot;https://en.wikipedia.org/wiki/K_(programming_language)&quot;&gt;k&lt;/a&gt;/&lt;a href=&quot;https://en.wikipedia.org/wiki/Q_(programming_language_from_Kx_Systems)&quot;&gt;q&lt;/a&gt;.
It is a &lt;a href=&quot;https://en.wikipedia.org/wiki/Tagged_union&quot;&gt;tagged union&lt;/a&gt; - a struct that can hold different data types
and has a tag (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;t&lt;/code&gt; in this case) to indicate the &lt;em&gt;type&lt;/em&gt; a particular instance holds.&lt;/p&gt;

&lt;p&gt;The elements &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;e&lt;/code&gt;-&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;j&lt;/code&gt; inside the union are used to represent the basic/atomic types:&lt;/p&gt;

&lt;table class=&quot;bt&quot;&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th style=&quot;text-align:center&quot;&gt;e&lt;/th&gt;
      &lt;th style=&quot;text-align:center&quot;&gt;sz&lt;/th&gt;
      &lt;th style=&quot;text-align:center&quot;&gt;n&lt;/th&gt;
      &lt;th style=&quot;text-align:center&quot;&gt;c&lt;/th&gt;
      &lt;th&gt;k type(s)&lt;/th&gt;
      &lt;th&gt;e.g.&lt;/th&gt;
      &lt;th&gt;value&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td rowspan=&quot;4&quot; style=&quot;text-align:center&quot;&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;g&lt;/code&gt;&lt;/td&gt;
      &lt;td rowspan=&quot;4&quot; style=&quot;text-align:center&quot;&gt;1&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align:center&quot;&gt;-1&lt;/td&gt;
      &lt;td style=&quot;text-align:center&quot;&gt;b&lt;/td&gt;
      &lt;td&gt;bool&lt;/td&gt;
      &lt;td&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;1b&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align:center&quot;&gt;-4&lt;/td&gt;
      &lt;td style=&quot;text-align:center&quot;&gt;x&lt;/td&gt;
      &lt;td&gt;byte&lt;/td&gt;
      &lt;td&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;0x2a&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align:center&quot;&gt;-10&lt;/td&gt;
      &lt;td style=&quot;text-align:center&quot;&gt;c&lt;/td&gt;
      &lt;td&gt;char&lt;/td&gt;
      &lt;td&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;&quot;a&quot;&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align:center&quot;&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;h&lt;/code&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align:center&quot;&gt;2&lt;/td&gt;
      &lt;td style=&quot;text-align:center&quot;&gt;-5&lt;/td&gt;
      &lt;td style=&quot;text-align:center&quot;&gt;h&lt;/td&gt;
      &lt;td&gt;short&lt;/td&gt;
      &lt;td&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;98h&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td rowspan=&quot;7&quot; style=&quot;text-align:center&quot;&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;i&lt;/code&gt;&lt;/td&gt;
      &lt;td rowspan=&quot;7&quot; style=&quot;text-align:center&quot;&gt;4&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align:center&quot;&gt;-6&lt;/td&gt;
      &lt;td style=&quot;text-align:center&quot;&gt;i&lt;/td&gt;
      &lt;td&gt;int&lt;/td&gt;
      &lt;td&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;42i&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align:center&quot;&gt;-13&lt;/td&gt;
      &lt;td style=&quot;text-align:center&quot;&gt;m&lt;/td&gt;
      &lt;td&gt;month&lt;/td&gt;
      &lt;td&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;2020.05m&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;months from &lt;code class=&quot;highlighter-rouge&quot;&gt;2020.01m&lt;/code&gt;&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align:center&quot;&gt;-14&lt;/td&gt;
      &lt;td style=&quot;text-align:center&quot;&gt;d&lt;/td&gt;
      &lt;td&gt;date&lt;/td&gt;
      &lt;td&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;2020.05.23&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;days from &lt;code class=&quot;highlighter-rouge&quot;&gt;2020.01.01&lt;/code&gt;&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align:center&quot;&gt;-17&lt;/td&gt;
      &lt;td style=&quot;text-align:center&quot;&gt;u&lt;/td&gt;
      &lt;td&gt;minute&lt;/td&gt;
      &lt;td&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;12:00&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;mins from &lt;code class=&quot;highlighter-rouge&quot;&gt;00:00&lt;/code&gt;&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align:center&quot;&gt;-18&lt;/td&gt;
      &lt;td style=&quot;text-align:center&quot;&gt;v&lt;/td&gt;
      &lt;td&gt;second&lt;/td&gt;
      &lt;td&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;12:00:00&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;secs from &lt;code class=&quot;highlighter-rouge&quot;&gt;00:00:00&lt;/code&gt;&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align:center&quot;&gt;-19&lt;/td&gt;
      &lt;td style=&quot;text-align:center&quot;&gt;t&lt;/td&gt;
      &lt;td&gt;time&lt;/td&gt;
      &lt;td&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;12:00:00.500&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;msecs from &lt;code class=&quot;highlighter-rouge&quot;&gt;00:00:00.000&lt;/code&gt;&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td rowspan=&quot;4&quot; style=&quot;text-align:center&quot;&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;j&lt;/code&gt;&lt;/td&gt;
      &lt;td rowspan=&quot;4&quot; style=&quot;text-align:center&quot;&gt;8&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align:center&quot;&gt;-7&lt;/td&gt;
      &lt;td style=&quot;text-align:center&quot;&gt;j&lt;/td&gt;
      &lt;td&gt;long&lt;/td&gt;
      &lt;td&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;42j&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align:center&quot;&gt;-16&lt;/td&gt;
      &lt;td style=&quot;text-align:center&quot;&gt;n&lt;/td&gt;
      &lt;td&gt;timespan&lt;/td&gt;
      &lt;td&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;0D18:58:00.729433000&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;nsecs from &lt;code class=&quot;highlighter-rouge&quot;&gt;00:00:00.000000000&lt;/code&gt;&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align:center&quot;&gt;-12&lt;/td&gt;
      &lt;td style=&quot;text-align:center&quot;&gt;p&lt;/td&gt;
      &lt;td&gt;timestamp&lt;/td&gt;
      &lt;td&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;2020.05.23D18:59:29.409153000&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;nsecs from &lt;code class=&quot;highlighter-rouge&quot;&gt;2000.01.01&lt;/code&gt; midnight&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align:center&quot;&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;e&lt;/code&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align:center&quot;&gt;4&lt;/td&gt;
      &lt;td style=&quot;text-align:center&quot;&gt;-8&lt;/td&gt;
      &lt;td style=&quot;text-align:center&quot;&gt;e&lt;/td&gt;
      &lt;td&gt;real&lt;/td&gt;
      &lt;td&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;42.42e&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td rowspan=&quot;3&quot; style=&quot;text-align:center&quot;&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;f&lt;/code&gt;&lt;/td&gt;
      &lt;td rowspan=&quot;3&quot; style=&quot;text-align:center&quot;&gt;8&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align:center&quot;&gt;-9&lt;/td&gt;
      &lt;td style=&quot;text-align:center&quot;&gt;f&lt;/td&gt;
      &lt;td&gt;float&lt;/td&gt;
      &lt;td&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;42.42f&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align:center&quot;&gt;-15&lt;/td&gt;
      &lt;td style=&quot;text-align:center&quot;&gt;z&lt;/td&gt;
      &lt;td&gt;datetime&lt;/td&gt;
      &lt;td&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;2020.05.23T18:59:29.409&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;fractional days from &lt;code class=&quot;highlighter-rouge&quot;&gt;2020.01.01&lt;/code&gt;&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td rowspan=&quot;3&quot; style=&quot;text-align:center&quot;&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;s&lt;/code&gt;&lt;/td&gt;
      &lt;td rowspan=&quot;3&quot; style=&quot;text-align:center&quot;&gt;.&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align:center&quot;&gt;-11&lt;/td&gt;
      &lt;td style=&quot;text-align:center&quot;&gt;s&lt;/td&gt;
      &lt;td&gt;symbol&lt;/td&gt;
      &lt;td&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;`hi&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;pointer to an &lt;a href=&quot;https://en.wikipedia.org/wiki/String_interning&quot;&gt;interned string&lt;/a&gt;&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align:center&quot;&gt;-128&lt;/td&gt;
      &lt;td style=&quot;text-align:center&quot;&gt;&lt;/td&gt;
      &lt;td&gt;error&lt;/td&gt;
      &lt;td&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;&apos;&quot;error&quot;&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;small&gt;
&lt;em&gt;e&lt;/em&gt;: element from anonymous union inside &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;k0&lt;/code&gt; struct&lt;br /&gt;
&lt;em&gt;sz&lt;/em&gt;: size of element in bytes&lt;br /&gt;
&lt;em&gt;n&lt;/em&gt;: type (value of &lt;code class=&quot;highlight language-c&quot; data-lang=&quot;c&quot;&gt;&lt;span class=&quot;n&quot;&gt;K&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;/code&gt;)&lt;br /&gt;
&lt;em&gt;c&lt;/em&gt;: char code for type
&lt;/small&gt;&lt;/p&gt;

&lt;p&gt;Creating an atomic K object of type &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;t&lt;/code&gt; is straightforward:&lt;/p&gt;
&lt;div class=&quot;language-c highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;K&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;createAtom&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;I&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;){&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;K&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;malloc&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;sizeof&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;  &lt;span class=&quot;c1&quot;&gt;// type&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;r&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;  &lt;span class=&quot;c1&quot;&gt;// ref count&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;}&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;K&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;createInt&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;I&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;){&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;K&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;y&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;createAtom&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;6&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;y&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;y&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;}&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;K&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;createLong&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;J&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;){&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;K&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;y&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;createAtom&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;7&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;y&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;j&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;y&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;}&lt;/span&gt;  &lt;span class=&quot;c1&quot;&gt;// ...similarly for other types&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;As you can see above, members of the unnamed union inside &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;k0&lt;/code&gt; can be accessed as if they were members of the outer
struct. Moreover, this works recursively, so even members of the last struct of the union can be accessed directly.
For instance, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;n&lt;/code&gt; can be accessed directly as &lt;code class=&quot;highlight language-c&quot; data-lang=&quot;c&quot;&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;&lt;/code&gt;, where &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;x&lt;/code&gt; is of type &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;K&lt;/code&gt; - a pointer to &lt;code class=&quot;highlight language-c&quot; data-lang=&quot;c&quot;&gt;&lt;span class=&quot;k&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;k0&lt;/span&gt;&lt;/code&gt;.
These are called &lt;em&gt;anonymous structs/unions&lt;/em&gt; and were 
&lt;a href=&quot;http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf#page=133&quot;&gt;standardized in C11&lt;/a&gt; although
&lt;a href=&quot;https://gcc.gnu.org/onlinedocs/gcc/Unnamed-Fields.html&quot;&gt;gcc&lt;/a&gt; supported them as a language extension before that.&lt;/p&gt;

&lt;p&gt;To store &lt;em&gt;simple lists&lt;/em&gt; (i.e. lists of atoms of homogeneous type), the last member of the union is used - the struct
with members &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;n&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;G0&lt;/code&gt;. While allocating memory for a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;K&lt;/code&gt; object, sizeof(list) bytes are added in for the
dynamically sized array. This is known as the &lt;a href=&quot;http://c-faq.com/struct/structhack.html&quot;&gt;struct hack&lt;/a&gt;&lt;sup id=&quot;fnref:1&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot; role=&quot;doc-noteref&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;.
Code to create a simple list of size &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;n&lt;/code&gt; and type &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;t&lt;/code&gt; would look something like this:&lt;/p&gt;

&lt;div class=&quot;language-c highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kt&quot;&gt;size_t&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;listMallocSize&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;I&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;J&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;){&lt;/span&gt;
  &lt;span class=&quot;kt&quot;&gt;size_t&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;sizeof&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;K&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;switch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;){&lt;/span&gt;  &lt;span class=&quot;c1&quot;&gt;// type of a simple list of atoms of type t is negative t&lt;/span&gt;
      &lt;span class=&quot;k&quot;&gt;case&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;7&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;+=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;sizeof&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;J&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;break&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;  &lt;span class=&quot;c1&quot;&gt;// simple list of atoms of int (t=-7) having t=7&lt;/span&gt;
      &lt;span class=&quot;k&quot;&gt;case&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;6&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;+=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;sizeof&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;I&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;break&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
      &lt;span class=&quot;c1&quot;&gt;// ... and so on for other types&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;  &lt;span class=&quot;c1&quot;&gt;// -1 to account for G0[1]&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;K&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;createSimpleList&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;I&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;J&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;){&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;K&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;malloc&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;listMallocSize&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;));&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;r&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;  &lt;span class=&quot;c1&quot;&gt;// vector size n&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;}&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// K intVector = createSimpleList(6,100);&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// access yth element of intVector as ((I*)(intVector-&amp;gt;G0))[y] - k.h has macros for this&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The alternative instead of this &lt;em&gt;hack&lt;/em&gt; would be to make &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;G0&lt;/code&gt; a pointer, which would have some drawbacks:&lt;/p&gt;
&lt;ol&gt;
  &lt;li&gt;it would require 2 &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;malloc&lt;/code&gt;/&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;free&lt;/code&gt; calls for allocation/de-allocation of the struct and list.&lt;/li&gt;
  &lt;li&gt;accessing the list would require a dereference each time.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;After simple lists, we have the &lt;em&gt;general list&lt;/em&gt; - a list of any heterogeneous types.
In this case, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;G0&lt;/code&gt; would be a list of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;K&lt;/code&gt; objects, i.e. pointers to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;k0&lt;/code&gt; structs each of which could be of a different
type. As we’d expect, operations on this type of list would be slower due to the multiple pointer dereferences and
non-contiguous layout of the data. This would also be the rationale behind why k/q tries to convert any general
list to a simple list automatically - they are much faster and able to utilize
&lt;a href=&quot;https://en.wikipedia.org/wiki/SSE2&quot;&gt;SSE&lt;/a&gt;/&lt;a href=&quot;https://en.wikipedia.org/wiki/Advanced_Vector_Extensions&quot;&gt;AVX&lt;/a&gt; instructions.&lt;/p&gt;

&lt;p&gt;Further along, there are &lt;em&gt;dictionaries&lt;/em&gt;, which are simply a pair of conforming lists, i.e. a pair of lists of equal size.
The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;G0&lt;/code&gt; array, similar to a general list would now be a list of 2 &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;K&lt;/code&gt; objects each of which is a general/simple list -
the &lt;em&gt;key&lt;/em&gt; list (access using &lt;code class=&quot;highlight language-c&quot; data-lang=&quot;c&quot;&gt;&lt;span class=&quot;p&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;K&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;G0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;&lt;/code&gt;) and the &lt;em&gt;value&lt;/em&gt; list (&lt;code class=&quot;highlight language-c&quot; data-lang=&quot;c&quot;&gt;&lt;span class=&quot;p&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;K&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;G0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;This ultimately leads us to the &lt;em&gt;table&lt;/em&gt; - the most important data structure in k/q.
A table is a &lt;em&gt;flipped column dictionary&lt;/em&gt;. A column dictionary is a dictionary with atomic symbol values as
keys and simple/general lists of equal sizes as the values. The flipped part means that this dictionary is
transposed (figuratively).&lt;/p&gt;

&lt;div class=&quot;language-q highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;cm&quot;&gt;/ a column dictionary with list of symbols as the key list and&lt;/span&gt;
&lt;span class=&quot;cm&quot;&gt;/ list of conforming list of longs as the value list&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;q)&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;show&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;d&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;ss&quot;&gt;`a`b`c&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;!&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;4&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;6&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;4&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;6&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;q)&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;show&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;flip&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;d&lt;/span&gt;&lt;span class=&quot;c1&quot;&gt;    / flipped column dictionary - a table&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;c&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;-----&lt;/span&gt;
&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt;
&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;4&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;6&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This fact is also reflected in it the table’s internal representation.
The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;K&lt;/code&gt; object for a table is a pointer to its column dictionary (using the self-referencing pointer &lt;code class=&quot;highlight language-c&quot; data-lang=&quot;c&quot;&gt;&lt;span class=&quot;k&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;k0&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;k&lt;/span&gt;&lt;/code&gt;).
This also shows us why taking the transpose (&lt;code class=&quot;highlight language-q&quot; data-lang=&quot;q&quot;&gt;&lt;span class=&quot;nb&quot;&gt;flip&lt;/span&gt;&lt;/code&gt;) of a list of conforming lists or a column dictionary is very cheap.&lt;/p&gt;

&lt;p&gt;In a similar vein, some of the other data structures are also constructed using the same &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;k0&lt;/code&gt; struct:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;keyed tables - dictionary with key and value as simple tables&lt;/li&gt;
  &lt;li&gt;splayed table - dictionary with key as list of columns and value as its file path symbol&lt;/li&gt;
  &lt;li&gt;partitioned table - dictionary with key as list of columns and value as a table name symbol&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All these diverse data structures supported by a single clever struct - pretty cool IMO.&lt;/p&gt;

&lt;hr /&gt;

&lt;div class=&quot;footnotes&quot; role=&quot;doc-endnotes&quot;&gt;
  &lt;ol&gt;
    &lt;li id=&quot;fn:1&quot;&gt;
      &lt;p&gt;the size of the last array member of a struct can be kept unspecified in C99, which is known as &lt;em&gt;flexible array members&lt;/em&gt; &lt;a href=&quot;#fnref:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
  &lt;/ol&gt;
&lt;/div&gt;</content>

      
      
      
      
      

      
        <author>
            <name>Arpan Kapoor</name>
          
          
        </author>
      

      

      

      
        <summary type="html">typedef char *S, C; typedef unsigned char G; typedef short H; typedef int I; typedef long long J; typedef float E; typedef double F; typedef void V; typedef struct k0 { signed char m, a, t; char u; int r; union { G g; H h; I i; J j; E e; F f; S s; struct k0 *k; struct { J n; G G0[1]; }; }; } * K; The above code snippet in my opinion provides a glimpse into the simplicity in design of the APL-like languages, from Ken Iverson’s APL in the 1960s to Arthur Whitney’s a+/k/q/kdb+/shakti. k0 as defined above (from k.h) is the struct which underpins all data types exposed by k/q. It is a tagged union - a struct that can hold different data types and has a tag (t in this case) to indicate the type a particular instance holds. The elements e-j inside the union are used to represent the basic/atomic types: e sz n c k type(s) e.g. value g 1 -1 b bool 1b -4 x byte 0x2a -10 c char &quot;a&quot; h 2 -5 h short 98h i 4 -6 i int 42i -13 m month 2020.05m months from 2020.01m -14 d date 2020.05.23 days from 2020.01.01 -17 u minute 12:00 mins from 00:00 -18 v second 12:00:00 secs from 00:00:00 -19 t time 12:00:00.500 msecs from 00:00:00.000 j 8 -7 j long 42j -16 n timespan 0D18:58:00.729433000 nsecs from 00:00:00.000000000 -12 p timestamp 2020.05.23D18:59:29.409153000 nsecs from 2000.01.01 midnight e 4 -8 e real 42.42e f 8 -9 f float 42.42f -15 z datetime 2020.05.23T18:59:29.409 fractional days from 2020.01.01 s . -11 s symbol `hi pointer to an interned string -128 error &apos;&quot;error&quot; e: element from anonymous union inside k0 struct sz: size of element in bytes n: type (value of K-&amp;gt;t) c: char code for type Creating an atomic K object of type t is straightforward: K createAtom(I t){ K x=malloc(sizeof*x); x-&amp;gt;t=t; // type x-&amp;gt;r=0; // ref count return x;} K createInt(I x){K y=createAtom(-6);y-&amp;gt;i=x;return y;} K createLong(J x){K y=createAtom(-7);y-&amp;gt;j=x;return y;} // ...similarly for other types As you can see above, members of the unnamed union inside k0 can be accessed as if they were members of the outer struct. Moreover, this works recursively, so even members of the last struct of the union can be accessed directly. For instance, n can be accessed directly as x-&amp;gt;n, where x is of type K - a pointer to struct k0. These are called anonymous structs/unions and were standardized in C11 although gcc supported them as a language extension before that. To store simple lists (i.e. lists of atoms of homogeneous type), the last member of the union is used - the struct with members n and G0. While allocating memory for a K object, sizeof(list) bytes are added in for the dynamically sized array. This is known as the struct hack1. Code to create a simple list of size n and type t would look something like this: size_t listMallocSize(I t,J n){ size_t x=sizeof*(K)0; if(n&amp;gt;0) switch(t){ // type of a simple list of atoms of type t is negative t case 7:x+=n*sizeof(J);break; // simple list of atoms of int (t=-7) having t=7 case 6:x+=n*sizeof(I);break; // ... and so on for other types } return x-1; // -1 to account for G0[1] } K createSimpleList(I t,J n){ K x=malloc(listMallocSize(t,n)); x-&amp;gt;t=t; x-&amp;gt;r=0; x-&amp;gt;n=n; // vector size n return x;} // K intVector = createSimpleList(6,100); // access yth element of intVector as ((I*)(intVector-&amp;gt;G0))[y] - k.h has macros for this The alternative instead of this hack would be to make G0 a pointer, which would have some drawbacks: it would require 2 malloc/free calls for allocation/de-allocation of the struct and list. accessing the list would require a dereference each time. After simple lists, we have the general list - a list of any heterogeneous types. In this case, G0 would be a list of K objects, i.e. pointers to k0 structs each of which could be of a different type. As we’d expect, operations on this type of list would be slower due to the multiple pointer dereferences and non-contiguous layout of the data. This would also be the rationale behind why k/q tries to convert any general list to a simple list automatically - they are much faster and able to utilize SSE/AVX instructions. Further along, there are dictionaries, which are simply a pair of conforming lists, i.e. a pair of lists of equal size. The G0 array, similar to a general list would now be a list of 2 K objects each of which is a general/simple list - the key list (access using ((K*)(x-&amp;gt;G0))[0]) and the value list (((K*)(x-&amp;gt;G0))[1]). This ultimately leads us to the table - the most important data structure in k/q. A table is a flipped column dictionary. A column dictionary is a dictionary with atomic symbol values as keys and simple/general lists of equal sizes as the values. The flipped part means that this dictionary is transposed (figuratively). / a column dictionary with list of symbols as the key list and / list of conforming list of longs as the value list q)show d:`a`b`c!(1 2;3 4;5 6) a| 1 2 b| 3 4 c| 5 6 q)show t:flip d / flipped column dictionary - a table a b c ----- 1 3 5 2 4 6 This fact is also reflected in it the table’s internal representation. The K object for a table is a pointer to its column dictionary (using the self-referencing pointer struct k0 *k). This also shows us why taking the transpose (flip) of a list of conforming lists or a column dictionary is very cheap. In a similar vein, some of the other data structures are also constructed using the same k0 struct: keyed tables - dictionary with key and value as simple tables splayed table - dictionary with key as list of columns and value as its file path symbol partitioned table - dictionary with key as list of columns and value as a table name symbol All these diverse data structures supported by a single clever struct - pretty cool IMO. the size of the last array member of a struct can be kept unspecified in C99, which is known as flexible array members &amp;#8617;</summary>
      

      
      
    </entry>
  
  
  
    <entry>
      
      <title type="html">ipcalc in q</title>
      
      
      <link href="https://arpankapoor.com/ipcalc" rel="alternate" type="text/html" title="ipcalc in q" />
      
      <published>2018-09-07T00:00:00+00:00</published>
      <updated>2018-09-07T00:00:00+00:00</updated>
      <id>https://arpankapoor.com/ipcalc</id>
      <content type="html" xml:base="https://arpankapoor.com/ipcalc">&lt;p&gt;I recently learnt about &lt;a href=&quot;http://jodies.de/ipcalc&quot;&gt;ipcalc&lt;/a&gt;, a nifty tool that helps visualize basic IPv4 concepts.
I think writing a very basic clone can help ingrain these concepts in one’s brain.&lt;/p&gt;

&lt;p&gt;Here’s my attempt in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;q&lt;/code&gt;:&lt;/p&gt;
&lt;div class=&quot;language-q highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;pow&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;{(&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*/&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;y&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;#&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;rp&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;y&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;count&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;#&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot; &quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;c1&quot;&gt; / right pad&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;is&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;{[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;](&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;ow&quot;&gt;sublist&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot; &quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;_&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;c1&quot;&gt; / insert space&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;bv2i&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;ow&quot;&gt;sv&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mb&quot;&gt;0b&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;]&lt;/span&gt;&lt;span class=&quot;c1&quot;&gt; / boolean vector to integer&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;i2bv&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;ow&quot;&gt;vs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mb&quot;&gt;0b&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;]&lt;/span&gt;&lt;span class=&quot;c1&quot;&gt; / integer to boolean vector&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;ip2dn&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;.&quot;&lt;/span&gt;&lt;span class=&quot;ow&quot;&gt;sv&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;string&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;h&quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;mh&quot;&gt;0x0&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;ow&quot;&gt;vs&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;c1&quot;&gt; / ip to dot notation&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;ip2class&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;128&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;first&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mh&quot;&gt;0x0&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;ow&quot;&gt;vs&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;A&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;192&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;B&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;224&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;C&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;240&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;D&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;E&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]}&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;ip2bdn&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;{[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;nm&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ip&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;is&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;nm&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;floor&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;nm&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;8&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;.&quot;&lt;/span&gt;&lt;span class=&quot;ow&quot;&gt;sv&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;raze&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;ow&quot;&gt;each&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;string&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i2bv&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;ow&quot;&gt;each&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mh&quot;&gt;0x0&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;ow&quot;&gt;vs&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ip&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]}&lt;/span&gt;&lt;span class=&quot;c1&quot;&gt; / ip to binary dot notation&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;ipcLine&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;rp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;11&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;color&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;ss&quot;&gt;`blue&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;rp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;y&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;21&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]]&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;z&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;color&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;{[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\033&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;[0;&quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;ss&quot;&gt;`red`orange`blue`purple&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;!&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;string&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;31&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;33&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;34&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;35&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;m&quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\033&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;[0m&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;main&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;w&quot;&gt;  &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ip&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;mh&quot;&gt;0x0&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;ow&quot;&gt;sv&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;x&quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;H&quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;.&quot;&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;ow&quot;&gt;vs&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;first&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;/&quot;&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;ow&quot;&gt;vs&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;no&quot;&gt;.z.x&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;w&quot;&gt;  &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;nm&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;H&quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;last&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;/&quot;&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;ow&quot;&gt;vs&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;no&quot;&gt;.z.x&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;w&quot;&gt;  &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;nmIp&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;bv2i&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;nm&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;#&lt;/span&gt;&lt;span class=&quot;mb&quot;&gt;1b&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;32&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;nm&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;#&lt;/span&gt;&lt;span class=&quot;mb&quot;&gt;0b&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;w&quot;&gt;  &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;wc&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;bv2i&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;nm&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;#&lt;/span&gt;&lt;span class=&quot;mb&quot;&gt;0b&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;32&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;nm&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;#&lt;/span&gt;&lt;span class=&quot;mb&quot;&gt;1b&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;w&quot;&gt;  &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;nIp&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;bv2i&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i2bv&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;nmIp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;ow&quot;&gt;and&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i2bv&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ip&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;w&quot;&gt;  &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;bIp&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;bv2i&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i2bv&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;nIp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;ow&quot;&gt;or&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i2bv&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;wc&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;w&quot;&gt;  &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;hmin&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;bv2i&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i2bv&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;nIp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;ow&quot;&gt;or&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;31&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;#&lt;/span&gt;&lt;span class=&quot;mb&quot;&gt;0b&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mb&quot;&gt;1b&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;w&quot;&gt;  &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;hmax&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;bv2i&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i2bv&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;bIp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;ow&quot;&gt;and&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;31&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;#&lt;/span&gt;&lt;span class=&quot;mb&quot;&gt;1b&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mb&quot;&gt;0b&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;w&quot;&gt;  &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ipclass&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ip2class&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ip&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;w&quot;&gt;  &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ipcLine&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;Address:&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ip2dn&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ip&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;color&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;ss&quot;&gt;`orange&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ip2bdn&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;nm&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ip&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]]];&lt;/span&gt;
&lt;span class=&quot;w&quot;&gt;  &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ipcLine&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;Netmask:&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ip2dn&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;nmIp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot; = &quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;string&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;nm&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;color&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;ss&quot;&gt;`red&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ip2bdn&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;nm&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;nmIp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]]];&lt;/span&gt;
&lt;span class=&quot;w&quot;&gt;  &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ipcLine&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;Wildcard:&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ip2dn&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;wc&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;color&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;ss&quot;&gt;`orange&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ip2bdn&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;nm&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;wc&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]]];&lt;/span&gt;
&lt;span class=&quot;w&quot;&gt;  &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;=&amp;gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;w&quot;&gt;  &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ipcLine&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;Network:&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ip2dn&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;nIp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;/&quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;string&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;nm&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;color&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;ss&quot;&gt;`purple&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;cb&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;#&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;nmBIp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;color&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;ss&quot;&gt;`orange&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;cb&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;ABCDE&quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;!&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;4&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;4&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ipclass&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;_&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;nmBIp&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ip2bdn&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;nm&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;nIp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]]];&lt;/span&gt;
&lt;span class=&quot;w&quot;&gt;  &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ipcLine&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;HostMin:&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ip2dn&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;hmin&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;color&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;ss&quot;&gt;`orange&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ip2bdn&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;nm&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;hmin&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]]];&lt;/span&gt;
&lt;span class=&quot;w&quot;&gt;  &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ipcLine&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;HostMax:&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ip2dn&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;hmax&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;color&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;ss&quot;&gt;`orange&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ip2bdn&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;nm&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;hmax&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]]];&lt;/span&gt;
&lt;span class=&quot;w&quot;&gt;  &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ipcLine&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;Broadcast:&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ip2dn&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;bIp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;color&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;ss&quot;&gt;`orange&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ip2bdn&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;nm&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;bIp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]]];&lt;/span&gt;
&lt;span class=&quot;w&quot;&gt;  &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ipcLine&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;Hosts/Net:&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;string&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pow&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;32&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;nm&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;color&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;ss&quot;&gt;`purple&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot; Class &quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;string&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ipclass&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]];&lt;/span&gt;
&lt;span class=&quot;w&quot;&gt;  &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;}&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;main&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[]&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;\\
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;some sample outputs:&lt;/p&gt;
&lt;pre&gt;$ q ipcalc.q 12.7.149.12/26 -q
Address:   &lt;font color=&quot;#3465A4&quot;&gt;12.7.149.12          &lt;/font&gt;&lt;font color=&quot;#C4A000&quot;&gt;00001100.00000111.10010101.00 001100&lt;/font&gt;
Netmask:   &lt;font color=&quot;#3465A4&quot;&gt;255.255.255.192 = 26 &lt;/font&gt;&lt;font color=&quot;#CC0000&quot;&gt;11111111.11111111.11111111.11 000000&lt;/font&gt;
Wildcard:  &lt;font color=&quot;#3465A4&quot;&gt;0.0.0.63             &lt;/font&gt;&lt;font color=&quot;#C4A000&quot;&gt;00000000.00000000.00000000.00 111111&lt;/font&gt;
=&amp;gt;
Network:   &lt;font color=&quot;#3465A4&quot;&gt;12.7.149.0/26        &lt;/font&gt;&lt;font color=&quot;#75507B&quot;&gt;0&lt;/font&gt;&lt;font color=&quot;#C4A000&quot;&gt;0001100.00000111.10010101.00 000000&lt;/font&gt;
HostMin:   &lt;font color=&quot;#3465A4&quot;&gt;12.7.149.1           &lt;/font&gt;&lt;font color=&quot;#C4A000&quot;&gt;00001100.00000111.10010101.00 000001&lt;/font&gt;
HostMax:   &lt;font color=&quot;#3465A4&quot;&gt;12.7.149.62          &lt;/font&gt;&lt;font color=&quot;#C4A000&quot;&gt;00001100.00000111.10010101.00 111110&lt;/font&gt;
Broadcast: &lt;font color=&quot;#3465A4&quot;&gt;12.7.149.63          &lt;/font&gt;&lt;font color=&quot;#C4A000&quot;&gt;00001100.00000111.10010101.00 111111&lt;/font&gt;
Hosts/Net: &lt;font color=&quot;#3465A4&quot;&gt;62                   &lt;/font&gt;&lt;font color=&quot;#75507B&quot;&gt; Class A&lt;/font&gt;

$ q ipcalc.q 216.163.54.22/28 -q
Address:   &lt;font color=&quot;#3465A4&quot;&gt;216.163.54.22        &lt;/font&gt;&lt;font color=&quot;#C4A000&quot;&gt;11011000.10100011.00110110.0001 0110&lt;/font&gt;
Netmask:   &lt;font color=&quot;#3465A4&quot;&gt;255.255.255.240 = 28 &lt;/font&gt;&lt;font color=&quot;#CC0000&quot;&gt;11111111.11111111.11111111.1111 0000&lt;/font&gt;
Wildcard:  &lt;font color=&quot;#3465A4&quot;&gt;0.0.0.15             &lt;/font&gt;&lt;font color=&quot;#C4A000&quot;&gt;00000000.00000000.00000000.0000 1111&lt;/font&gt;
=&amp;gt;
Network:   &lt;font color=&quot;#3465A4&quot;&gt;216.163.54.16/28     &lt;/font&gt;&lt;font color=&quot;#75507B&quot;&gt;110&lt;/font&gt;&lt;font color=&quot;#C4A000&quot;&gt;11000.10100011.00110110.0001 0000&lt;/font&gt;
HostMin:   &lt;font color=&quot;#3465A4&quot;&gt;216.163.54.17        &lt;/font&gt;&lt;font color=&quot;#C4A000&quot;&gt;11011000.10100011.00110110.0001 0001&lt;/font&gt;
HostMax:   &lt;font color=&quot;#3465A4&quot;&gt;216.163.54.30        &lt;/font&gt;&lt;font color=&quot;#C4A000&quot;&gt;11011000.10100011.00110110.0001 1110&lt;/font&gt;
Broadcast: &lt;font color=&quot;#3465A4&quot;&gt;216.163.54.31        &lt;/font&gt;&lt;font color=&quot;#C4A000&quot;&gt;11011000.10100011.00110110.0001 1111&lt;/font&gt;
Hosts/Net: &lt;font color=&quot;#3465A4&quot;&gt;14                   &lt;/font&gt;&lt;font color=&quot;#75507B&quot;&gt; Class C&lt;/font&gt;

&lt;/pre&gt;

&lt;p&gt;go write your own now…&lt;/p&gt;

&lt;h1 id=&quot;references&quot;&gt;references&lt;/h1&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;http://linux-ip.net/html/routing-intro.html#ex-routing-intro-ipcalc&quot;&gt;http://linux-ip.net/html/routing-intro.html#ex-routing-intro-ipcalc&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</content>

      
      
      
      
      

      
        <author>
            <name>Arpan Kapoor</name>
          
          
        </author>
      

      

      

      
        <summary type="html">I recently learnt about ipcalc, a nifty tool that helps visualize basic IPv4 concepts. I think writing a very basic clone can help ingrain these concepts in one’s brain. Here’s my attempt in q: pow:{(*/)y#x} rp:{x,(y-count x)#&quot; &quot;} / right pad is:{[n;s](n sublist s),&quot; &quot;,n _ s} / insert space bv2i:sv[0b;] / boolean vector to integer i2bv:vs[0b;] / integer to boolean vector ip2dn:{&quot;.&quot;sv string&quot;h&quot;$0x0 vs x} / ip to dot notation ip2class:{$[128&amp;gt;c:first 0x0 vs x;&quot;A&quot;;192&amp;gt;c;&quot;B&quot;;224&amp;gt;c;&quot;C&quot;;240&amp;gt;c;&quot;D&quot;;&quot;E&quot;]} ip2bdn:{[nm;ip]is[nm+floor nm%8;&quot;.&quot;sv raze each string i2bv each 0x0 vs ip]} / ip to binary dot notation ipcLine:{rp[x;11],color[`blue;rp[y;21]],z} color:{[c;s]&quot;\033[0;&quot;,((`red`orange`blue`purple!string 31 33 34 35)c),&quot;m&quot;,s,&quot;\033[0m&quot;} main:{ ip:0x0 sv &quot;x&quot;$&quot;H&quot;$&quot;.&quot; vs first &quot;/&quot; vs .z.x 0; nm:&quot;H&quot;$last &quot;/&quot; vs .z.x 0; nmIp:bv2i (nm#1b),(32-nm)#0b; wc:bv2i (nm#0b),(32-nm)#1b; nIp:bv2i i2bv[nmIp]and i2bv ip; bIp:bv2i i2bv[nIp]or i2bv wc; hmin:bv2i i2bv[nIp]or (31#0b),1b; hmax:bv2i i2bv[bIp]and (31#1b),0b; ipclass:ip2class ip; -1 ipcLine[&quot;Address:&quot;;ip2dn ip;color[`orange;ip2bdn[nm;ip]]]; -1 ipcLine[&quot;Netmask:&quot;;ip2dn[nmIp],&quot; = &quot;,string nm;color[`red;ip2bdn[nm;nmIp]]]; -1 ipcLine[&quot;Wildcard:&quot;;ip2dn wc;color[`orange;ip2bdn[nm;wc]]]; -1 &quot;=&amp;gt;&quot;; -1 ipcLine[&quot;Network:&quot;;ip2dn[nIp],&quot;/&quot;,string nm;color[`purple;cb#nmBIp],color[`orange;(cb:(&quot;ABCDE&quot;!1 2 3 4 4)ipclass)_ nmBIp:ip2bdn[nm;nIp]]]; -1 ipcLine[&quot;HostMin:&quot;;ip2dn hmin;color[`orange;ip2bdn[nm;hmin]]]; -1 ipcLine[&quot;HostMax:&quot;;ip2dn hmax;color[`orange;ip2bdn[nm;hmax]]]; -1 ipcLine[&quot;Broadcast:&quot;;ip2dn bIp;color[`orange;ip2bdn[nm;bIp]]]; -1 ipcLine[&quot;Hosts/Net:&quot;;string pow[2;32-nm]-2;color[`purple;&quot; Class &quot;,string ipclass]]; -1 &quot;&quot;;} main[] \\ some sample outputs: $ q ipcalc.q 12.7.149.12/26 -q Address: 12.7.149.12 00001100.00000111.10010101.00 001100 Netmask: 255.255.255.192 = 26 11111111.11111111.11111111.11 000000 Wildcard: 0.0.0.63 00000000.00000000.00000000.00 111111 =&amp;gt; Network: 12.7.149.0/26 00001100.00000111.10010101.00 000000 HostMin: 12.7.149.1 00001100.00000111.10010101.00 000001 HostMax: 12.7.149.62 00001100.00000111.10010101.00 111110 Broadcast: 12.7.149.63 00001100.00000111.10010101.00 111111 Hosts/Net: 62 Class A $ q ipcalc.q 216.163.54.22/28 -q Address: 216.163.54.22 11011000.10100011.00110110.0001 0110 Netmask: 255.255.255.240 = 28 11111111.11111111.11111111.1111 0000 Wildcard: 0.0.0.15 00000000.00000000.00000000.0000 1111 =&amp;gt; Network: 216.163.54.16/28 11011000.10100011.00110110.0001 0000 HostMin: 216.163.54.17 11011000.10100011.00110110.0001 0001 HostMax: 216.163.54.30 11011000.10100011.00110110.0001 1110 Broadcast: 216.163.54.31 11011000.10100011.00110110.0001 1111 Hosts/Net: 14 Class C go write your own now… references http://linux-ip.net/html/routing-intro.html#ex-routing-intro-ipcalc</summary>
      

      
      
    </entry>
  
  
  
    <entry>
      
      <title type="html">setting up your personal domain and email</title>
      
      
      <link href="https://arpankapoor.com/setup" rel="alternate" type="text/html" title="setting up your personal domain and email" />
      
      <published>2018-07-14T00:00:00+00:00</published>
      <updated>2018-07-14T00:00:00+00:00</updated>
      <id>https://arpankapoor.com/setup</id>
      <content type="html" xml:base="https://arpankapoor.com/setup">&lt;p&gt;Instructions to setup a custom domain and email addresses similar to this one.&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;
    &lt;p&gt;Setup a user page on &lt;a href=&quot;https://about.gitlab.com/features/pages/&quot;&gt;gitlab&lt;/a&gt;.
This is very similar to &lt;a href=&quot;https://pages.github.com/&quot;&gt;github pages&lt;/a&gt; with the
added benefit of being able to use &lt;em&gt;any&lt;/em&gt; &lt;a href=&quot;https://www.staticgen.com/&quot;&gt;static site generator&lt;/a&gt;
instead of being limited to &lt;a href=&quot;https://jekyllrb.com/&quot;&gt;jekyll&lt;/a&gt;.&lt;/p&gt;

    &lt;p&gt;This will create your static website at &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;https://&amp;lt;username&amp;gt;.gitlab.io&lt;/code&gt;.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Buy a domain (I used &lt;a href=&quot;https://www.namecheap.com&quot;&gt;namecheap&lt;/a&gt;). This is the only step that costs &lt;strong&gt;$$&lt;/strong&gt;.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Change your DNS servers to &lt;a href=&quot;https://support.cloudflare.com/hc/en-us/articles/205195708-Step-3-Change-your-domain-name-servers-to-Cloudflare&quot;&gt;cloudflare&lt;/a&gt;.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Create DNS records and setup &lt;a href=&quot;https://about.gitlab.com/2017/02/07/setting-up-gitlab-pages-with-cloudflare-certificates/&quot;&gt;TLS certificates&lt;/a&gt;.
In my case, I created &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;A&lt;/code&gt; records for both &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;arpankapoor.com&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;www.arpankapoor.com&lt;/code&gt;.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Setup &lt;a href=&quot;https://support.cloudflare.com/hc/en-us/articles/360006660072-Understanding-and-Configuring-DNSSEC-in-Cloudflare-DNS&quot;&gt;DNSSEC&lt;/a&gt;.
I was intermittently facing DNS resolution issues in certain regions before setting this up.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;To redirect all &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;http&lt;/code&gt; requests to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;https&lt;/code&gt;, enable &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Always use HTTPS&lt;/code&gt; under the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Crypto&lt;/code&gt; tab on cloudflare.
Also consider enabling &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Automatic HTTPS Rewrites&lt;/code&gt;.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;If you’d like to redirect &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;www&lt;/code&gt; requests to the non-&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;www&lt;/code&gt; version, create a &lt;a href=&quot;https://support.cloudflare.com/hc/en-us/articles/200172286-How-do-I-perform-URL-forwarding-or-redirects-with-Cloudflare-&quot;&gt;cloudflare page rule&lt;/a&gt;.&lt;/p&gt;

    &lt;p&gt;I’ve setup &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;*www.arpankapoor.com/*&lt;/code&gt; to permanently redirect (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;HTTP 301&lt;/code&gt;) to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;https://arpankapoor.com/$2&lt;/code&gt;.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Setup your custom domain email addresses using &lt;a href=&quot;https://renzo.lucioni.xyz/mail-forwarding-with-mailgun/&quot;&gt;Mailgun&lt;/a&gt;.
It processes 10k messages per month for free, which is more than sufficient for my needs.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Thanks to gitlab, cloudflare and mailgun, you’ve now got your custom domain up and running with &lt;em&gt;free&lt;/em&gt; hosting, TLS certificates and email!&lt;/p&gt;</content>

      
      
      
      
      

      
        <author>
            <name>Arpan Kapoor</name>
          
          
        </author>
      

      

      

      
        <summary type="html">Instructions to setup a custom domain and email addresses similar to this one. Setup a user page on gitlab. This is very similar to github pages with the added benefit of being able to use any static site generator instead of being limited to jekyll. This will create your static website at https://&amp;lt;username&amp;gt;.gitlab.io. Buy a domain (I used namecheap). This is the only step that costs $$. Change your DNS servers to cloudflare. Create DNS records and setup TLS certificates. In my case, I created A records for both arpankapoor.com and www.arpankapoor.com. Setup DNSSEC. I was intermittently facing DNS resolution issues in certain regions before setting this up. To redirect all http requests to https, enable Always use HTTPS under the Crypto tab on cloudflare. Also consider enabling Automatic HTTPS Rewrites. If you’d like to redirect www requests to the non-www version, create a cloudflare page rule. I’ve setup *www.arpankapoor.com/* to permanently redirect (HTTP 301) to https://arpankapoor.com/$2. Setup your custom domain email addresses using Mailgun. It processes 10k messages per month for free, which is more than sufficient for my needs. Thanks to gitlab, cloudflare and mailgun, you’ve now got your custom domain up and running with free hosting, TLS certificates and email!</summary>
      

      
      
    </entry>
  
  
  
    <entry>
      
      <title type="html">grokking awk</title>
      
      
      <link href="https://arpankapoor.com/awk" rel="alternate" type="text/html" title="grokking awk" />
      
      <published>2018-06-16T00:00:00+00:00</published>
      <updated>2018-06-16T00:00:00+00:00</updated>
      <id>https://arpankapoor.com/awk</id>
      <content type="html" xml:base="https://arpankapoor.com/awk">&lt;p&gt;&lt;a href=&quot;https://news.ycombinator.com/item?id=17322412&quot;&gt;This HN post&lt;/a&gt; got me to brush up my awk beyond the basic &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;{print $2,$3}&lt;/code&gt; to extract fields from csv-like files.
Here are some of my notes from The AWK Programming Language by Alfred &lt;strong&gt;A&lt;/strong&gt;ho, Peter &lt;strong&gt;W&lt;/strong&gt;einberger and Brian &lt;strong&gt;K&lt;/strong&gt;ernighan.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;sequence of pattern-action statements &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;pattern { action }&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;2 types of data: numbers and strings.&lt;/li&gt;
  &lt;li&gt;fields in current input line: &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;$1&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;$2&lt;/code&gt;… &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;$0&lt;/code&gt; is the entire line.&lt;/li&gt;
  &lt;li&gt;if there is no pattern, perform &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;action&lt;/code&gt; on every line&lt;/li&gt;
  &lt;li&gt;if there is no action, print lines that match the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;pattern&lt;/code&gt;
    &lt;div class=&quot;language-console highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;gp&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;echo&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-e&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;alice 20\nbob 30\niyer 24&apos;&lt;/span&gt; | &lt;span class=&quot;nb&quot;&gt;awk&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;$2&amp;gt;20&apos;&lt;/span&gt;  &lt;span class=&quot;c&quot;&gt;# all people aged &amp;gt;20&lt;/span&gt;
&lt;span class=&quot;go&quot;&gt;bob 30
iyer 24
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;    &lt;/div&gt;
  &lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NF&lt;/code&gt; - number of fields in current line
    &lt;div class=&quot;language-console highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;gp&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;echo&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;a b c d&apos;&lt;/span&gt; | &lt;span class=&quot;nb&quot;&gt;awk&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;{print NF, $1, $NF}&apos;&lt;/span&gt;  &lt;span class=&quot;c&quot;&gt;# print field count, the 1st and last field&lt;/span&gt;
&lt;span class=&quot;go&quot;&gt;4 a d
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;    &lt;/div&gt;
  &lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NR&lt;/code&gt; - line number
    &lt;div class=&quot;language-console highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;gp&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;echo&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-e&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;alice 20\nbob 30&apos;&lt;/span&gt; | &lt;span class=&quot;nb&quot;&gt;awk&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;{print NR, $0}&apos;&lt;/span&gt;  &lt;span class=&quot;c&quot;&gt;# prefix with line number&lt;/span&gt;
&lt;span class=&quot;go&quot;&gt;1 alice 20
2 bob 30
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;    &lt;/div&gt;
  &lt;/li&gt;
  &lt;li&gt;simple arithmetic
    &lt;div class=&quot;language-console highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;gp&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;echo&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;alice 5.50 22&apos;&lt;/span&gt; | &lt;span class=&quot;nb&quot;&gt;awk&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;{print $1, $2 * $3}&apos;&lt;/span&gt;
&lt;span class=&quot;go&quot;&gt;alice 121
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;    &lt;/div&gt;
  &lt;/li&gt;
  &lt;li&gt;use &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;printf&lt;/code&gt; like in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;C&lt;/code&gt;
    &lt;div class=&quot;language-console highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;gp&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;echo&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;alice 5.50 22&apos;&lt;/span&gt; | &lt;span class=&quot;nb&quot;&gt;awk&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;{printf(&quot;%s $%.2f\n&quot;, $1, $2 * $3)}&apos;&lt;/span&gt;
&lt;span class=&quot;gp&quot;&gt;alice $&lt;/span&gt;121.00
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;    &lt;/div&gt;
  &lt;/li&gt;
  &lt;li&gt;string/regex matching
    &lt;div class=&quot;language-console highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;gp&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;echo&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-e&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;susan 20\nbob 24\nsusie 12&apos;&lt;/span&gt; | &lt;span class=&quot;nb&quot;&gt;awk&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;$1==&quot;susie&quot;&apos;&lt;/span&gt;
&lt;span class=&quot;go&quot;&gt;susie 12
&lt;/span&gt;&lt;span class=&quot;gp&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;echo&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-e&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;susan 20\nbob 24\nsusie 12&apos;&lt;/span&gt; | &lt;span class=&quot;nb&quot;&gt;awk&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;$1 ~ /^su/&apos;&lt;/span&gt;  &lt;span class=&quot;c&quot;&gt;# all lines where the 1st field starts with &quot;su&quot;&lt;/span&gt;
&lt;span class=&quot;go&quot;&gt;susan 20
susie 12
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;    &lt;/div&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;use &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;||&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&amp;amp;&amp;amp;&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;!&lt;/code&gt; as in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;C&lt;/code&gt;&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;BEGIN&lt;/code&gt;,&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;END&lt;/code&gt; - special patterns that matches before first line is read and after last line has been processed.
    &lt;div class=&quot;language-console highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;gp&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;echo&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-e&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;susan,20\nbob,24\nsusie,12&apos;&lt;/span&gt; | &lt;span class=&quot;nb&quot;&gt;awk&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;BEGIN {print &quot;name,age&quot;} {print}&apos;&lt;/span&gt;  &lt;span class=&quot;c&quot;&gt;# add header to csv file&lt;/span&gt;
&lt;span class=&quot;go&quot;&gt;name,age
susan,20
bob,24
susie,12
&lt;/span&gt;&lt;span class=&quot;gp&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;echo&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-e&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;susan 20\nbob 24\nsusie 12&apos;&lt;/span&gt; | &lt;span class=&quot;nb&quot;&gt;awk&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;{sum=sum+$2} END{print sum/NR}&apos;&lt;/span&gt; &lt;span class=&quot;c&quot;&gt;# average age&lt;/span&gt;
&lt;span class=&quot;go&quot;&gt;18.6667
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;    &lt;/div&gt;
  &lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;if&lt;/code&gt;-&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;else&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;while&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;for&lt;/code&gt; loops similar to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;C&lt;/code&gt;
    &lt;div class=&quot;language-console highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;gp&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;cat&lt;/span&gt; /tmp/tmp
&lt;span class=&quot;go&quot;&gt;name,age
Alice,20
Bob,30
&lt;/span&gt;&lt;span class=&quot;gp&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;awk&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-F&lt;/span&gt;, &lt;span class=&quot;s1&quot;&gt;&apos;{for(i=1;i&amp;lt;=NF;i+=1)if(NR==1)a[i]=$i;else a[i]=a[i] FS $i}END{for(i=1;i&amp;lt;=NF;i+=1)print a[i]}&apos;&lt;/span&gt; /tmp/tmp
&lt;span class=&quot;go&quot;&gt;name,Alice,Bob
age,20,30
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;    &lt;/div&gt;

    &lt;p&gt;The one-liner above transposes the input csv file, expanded here for readability:&lt;/p&gt;

    &lt;div class=&quot;language-awk highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;  &lt;span class=&quot;c1&quot;&gt;# since this action block does NOT have a pattern, it will execute for all lines&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;=&lt;/span&gt; &lt;span class=&quot;kc&quot;&gt;NF&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kc&quot;&gt;NR&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;  &lt;span class=&quot;c1&quot;&gt;# if this is the first line&lt;/span&gt;
      &lt;span class=&quot;nx&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;$i&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;else&lt;/span&gt;
      &lt;span class=&quot;nx&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;kc&quot;&gt;FS&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;$i&lt;/span&gt;  &lt;span class=&quot;c1&quot;&gt;# concatenate current cell to the ith &quot;column&quot; string separated by the Field Separator&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
  
&lt;span class=&quot;kr&quot;&gt;END&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;# this action block gets executed after all lines have been processed&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;=&lt;/span&gt; &lt;span class=&quot;kc&quot;&gt;NF&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;print&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;    &lt;/div&gt;
  &lt;/li&gt;
  &lt;li&gt;use &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;length&lt;/code&gt; for finding string length
    &lt;div class=&quot;language-console highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;gp&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;echo&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-e&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;name age\nalice 20\nbob 30&apos;&lt;/span&gt; | &lt;span class=&quot;nb&quot;&gt;awk&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;{print length($0)}&apos;&lt;/span&gt;  &lt;span class=&quot;c&quot;&gt;# print length of each line&lt;/span&gt;
&lt;span class=&quot;go&quot;&gt;8
8
6
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;    &lt;/div&gt;
  &lt;/li&gt;
&lt;/ul&gt;</content>

      
      
      
      
      

      
        <author>
            <name>Arpan Kapoor</name>
          
          
        </author>
      

      

      

      
        <summary type="html">This HN post got me to brush up my awk beyond the basic {print $2,$3} to extract fields from csv-like files. Here are some of my notes from The AWK Programming Language by Alfred Aho, Peter Weinberger and Brian Kernighan. sequence of pattern-action statements pattern { action } 2 types of data: numbers and strings. fields in current input line: $1, $2… $0 is the entire line. if there is no pattern, perform action on every line if there is no action, print lines that match the pattern $ echo -e &apos;alice 20\nbob 30\niyer 24&apos; | awk &apos;$2&amp;gt;20&apos; # all people aged &amp;gt;20 bob 30 iyer 24 NF - number of fields in current line $ echo &apos;a b c d&apos; | awk &apos;{print NF, $1, $NF}&apos; # print field count, the 1st and last field 4 a d NR - line number $ echo -e &apos;alice 20\nbob 30&apos; | awk &apos;{print NR, $0}&apos; # prefix with line number 1 alice 20 2 bob 30 simple arithmetic $ echo &apos;alice 5.50 22&apos; | awk &apos;{print $1, $2 * $3}&apos; alice 121 use printf like in C $ echo &apos;alice 5.50 22&apos; | awk &apos;{printf(&quot;%s $%.2f\n&quot;, $1, $2 * $3)}&apos; alice $121.00 string/regex matching $ echo -e &apos;susan 20\nbob 24\nsusie 12&apos; | awk &apos;$1==&quot;susie&quot;&apos; susie 12 $ echo -e &apos;susan 20\nbob 24\nsusie 12&apos; | awk &apos;$1 ~ /^su/&apos; # all lines where the 1st field starts with &quot;su&quot; susan 20 susie 12 use ||, &amp;amp;&amp;amp; and ! as in C BEGIN,END - special patterns that matches before first line is read and after last line has been processed. $ echo -e &apos;susan,20\nbob,24\nsusie,12&apos; | awk &apos;BEGIN {print &quot;name,age&quot;} {print}&apos; # add header to csv file name,age susan,20 bob,24 susie,12 $ echo -e &apos;susan 20\nbob 24\nsusie 12&apos; | awk &apos;{sum=sum+$2} END{print sum/NR}&apos; # average age 18.6667 if-else, while and for loops similar to C $ cat /tmp/tmp name,age Alice,20 Bob,30 $ awk -F, &apos;{for(i=1;i&amp;lt;=NF;i+=1)if(NR==1)a[i]=$i;else a[i]=a[i] FS $i}END{for(i=1;i&amp;lt;=NF;i+=1)print a[i]}&apos; /tmp/tmp name,Alice,Bob age,20,30 The one-liner above transposes the input csv file, expanded here for readability: { # since this action block does NOT have a pattern, it will execute for all lines for (i = 1; i &amp;lt;= NF; i += 1) if (NR == 1) # if this is the first line a[i] = $i else a[i] = a[i] FS $i # concatenate current cell to the ith &quot;column&quot; string separated by the Field Separator } END { # this action block gets executed after all lines have been processed for (i = 1; i &amp;lt;= NF; i += 1) print a[i] } use length for finding string length $ echo -e &apos;name age\nalice 20\nbob 30&apos; | awk &apos;{print length($0)}&apos; # print length of each line 8 8 6</summary>
      

      
      
    </entry>
  
  
</feed>
