understanding the k struct

typedef char *S, C;
typedef unsigned char G;
typedef short H;
typedef int I;
typedef long long J;
typedef float E;
typedef double F;
typedef void V;
typedef struct k0 {
  signed char m, a, t;
  char u;
  int r;
  union {
    G g;
    H h;
    I i;
    J j;
    E e;
    F f;
    S s;
    struct k0 *k;
    struct {
      J n;
      G G0[1];
    };
  };
} * K;

The above code snippet in my opinion provides a glimpse into the simplicity in design of the APL-like languages, from Ken Iverson’s APL in the 1960s to Arthur Whitney’s a+/k/q/kdb+/shakti.

k0 as defined above (from k.h) is the struct which underpins all data types exposed by k/q. It is a tagged union - a struct that can hold different data types and has a tag (t in this case) to indicate the type a particular instance holds.

The elements e-j inside the union are used to represent the basic/atomic types:

e	sz	n	c	k type(s)	e.g.	value
`g`	1
		-1	b	bool	`1b`
		-4	x	byte	`0x2a`
		-10	c	char	`"a"`
`h`	2	-5	h	short	`98h`
`i`	4
		-6	i	int	`42i`
		-13	m	month	`2020.05m`	months from `2020.01m`
		-14	d	date	`2020.05.23`	days from `2020.01.01`
		-17	u	minute	`12:00`	mins from `00:00`
		-18	v	second	`12:00:00`	secs from `00:00:00`
		-19	t	time	`12:00:00.500`	msecs from `00:00:00.000`
`j`	8
		-7	j	long	`42j`
		-16	n	timespan	`0D18:58:00.729433000`	nsecs from `00:00:00.000000000`
		-12	p	timestamp	`2020.05.23D18:59:29.409153000`	nsecs from `2000.01.01` midnight
`e`	4	-8	e	real	`42.42e`
`f`	8
		-9	f	float	`42.42f`
		-15	z	datetime	`2020.05.23T18:59:29.409`	fractional days from `2020.01.01`
`s`	.
		-11	s	symbol	`hi	pointer to an interned string
		-128		error	`'"error"`

e: element from anonymous union inside k0 struct
sz: size of element in bytes
n: type (value of K->t)
c: char code for type

Creating an atomic K object of type t is straightforward:

K createAtom(I t){
  K x=malloc(sizeof*x);
  x->t=t;  // type
  x->r=0;  // ref count
  return x;}

K createInt(I x){K y=createAtom(-6);y->i=x;return y;}
K createLong(J x){K y=createAtom(-7);y->j=x;return y;}  // ...similarly for other types

As you can see above, members of the unnamed union inside k0 can be accessed as if they were members of the outer struct. Moreover, this works recursively, so even members of the last struct of the union can be accessed directly. For instance, n can be accessed directly as x->n, where x is of type K - a pointer to struct k0. These are called anonymous structs/unions and were standardized in C11 although gcc supported them as a language extension before that.

To store simple lists (i.e. lists of atoms of homogeneous type), the last member of the union is used - the struct with members n and G0. While allocating memory for a K object, sizeof(list) bytes are added in for the dynamically sized array. This is known as the struct hack¹. Code to create a simple list of size n and type t would look something like this:

size_t listMallocSize(I t,J n){
  size_t x=sizeof*(K)0;
  if(n>0)
    switch(t){  // type of a simple list of atoms of type t is negative t
      case 7:x+=n*sizeof(J);break;  // simple list of atoms of int (t=-7) having t=7
      case 6:x+=n*sizeof(I);break;
      // ... and so on for other types
    }
  return x-1;  // -1 to account for G0[1]
}
K createSimpleList(I t,J n){
  K x=malloc(listMallocSize(t,n));
  x->t=t;
  x->r=0;
  x->n=n;  // vector size n
  return x;}
// K intVector = createSimpleList(6,100);
// access yth element of intVector as ((I*)(intVector->G0))[y] - k.h has macros for this

The alternative instead of this hack would be to make G0 a pointer, which would have some drawbacks:

it would require 2 malloc/free calls for allocation/de-allocation of the struct and list.
accessing the list would require a dereference each time.

After simple lists, we have the general list - a list of any heterogeneous types. In this case, G0 would be a list of K objects, i.e. pointers to k0 structs each of which could be of a different type. As we’d expect, operations on this type of list would be slower due to the multiple pointer dereferences and non-contiguous layout of the data. This would also be the rationale behind why k/q tries to convert any general list to a simple list automatically - they are much faster and able to utilize SSE/AVX instructions.

Further along, there are dictionaries, which are simply a pair of conforming lists, i.e. a pair of lists of equal size. The G0 array, similar to a general list would now be a list of 2 K objects each of which is a general/simple list - the key list (access using ((K*)(x->G0))[0]) and the value list (((K*)(x->G0))[1]).

This ultimately leads us to the table - the most important data structure in k/q. A table is a flipped column dictionary. A column dictionary is a dictionary with atomic symbol values as keys and simple/general lists of equal sizes as the values. The flipped part means that this dictionary is transposed (figuratively).

/ a column dictionary with list of symbols as the key list and
/ list of conforming list of longs as the value list
q)show d:`a`b`c!(1 2;3 4;5 6)
a| 1 2
b| 3 4
c| 5 6
q)show t:flip d    / flipped column dictionary - a table
a b c
-----
1 3 5
2 4 6

This fact is also reflected in it the table’s internal representation. The K object for a table is a pointer to its column dictionary (using the self-referencing pointer struct k0 *k). This also shows us why taking the transpose (flip) of a list of conforming lists or a column dictionary is very cheap.

In a similar vein, some of the other data structures are also constructed using the same k0 struct:

keyed tables - dictionary with key and value as simple tables
splayed table - dictionary with key as list of columns and value as its file path symbol
partitioned table - dictionary with key as list of columns and value as a table name symbol

All these diverse data structures supported by a single clever struct - pretty cool IMO.

the size of the last array member of a struct can be kept unspecified in C99, which is known as flexible array members ↩