understanding the k struct
typedef char *S, C;
typedef unsigned char G;
typedef short H;
typedef int I;
typedef long long J;
typedef float E;
typedef double F;
typedef void V;
typedef struct k0 {
signed char m, a, t;
char u;
int r;
union {
G g;
H h;
I i;
J j;
E e;
F f;
S s;
struct k0 *k;
struct {
J n;
G G0[1];
};
};
} * K;
The above code snippet in my opinion provides a glimpse into the simplicity in design of the APL-like languages, from Ken Iverson’s APL in the 1960s to Arthur Whitney’s a+/k/q/kdb+/shakti.
k0
as defined above (from k.h) is the struct which underpins
all data types exposed by
k/q.
It is a tagged union - a struct that can hold different data types
and has a tag (t
in this case) to indicate the type a particular instance holds.
The elements e
-j
inside the union are used to represent the basic/atomic types:
e | sz | n | c | k type(s) | e.g. | value |
---|---|---|---|---|---|---|
g |
1 | |||||
-1 | b | bool | 1b |
|||
-4 | x | byte | 0x2a |
|||
-10 | c | char | "a" |
|||
h |
2 | -5 | h | short | 98h |
|
i |
4 | |||||
-6 | i | int | 42i |
|||
-13 | m | month | 2020.05m |
months from 2020.01m |
||
-14 | d | date | 2020.05.23 |
days from 2020.01.01 |
||
-17 | u | minute | 12:00 |
mins from 00:00 |
||
-18 | v | second | 12:00:00 |
secs from 00:00:00 |
||
-19 | t | time | 12:00:00.500 |
msecs from 00:00:00.000 |
||
j |
8 | |||||
-7 | j | long | 42j |
|||
-16 | n | timespan | 0D18:58:00.729433000 |
nsecs from 00:00:00.000000000 |
||
-12 | p | timestamp | 2020.05.23D18:59:29.409153000 |
nsecs from 2000.01.01 midnight |
||
e |
4 | -8 | e | real | 42.42e |
|
f |
8 | |||||
-9 | f | float | 42.42f |
|||
-15 | z | datetime | 2020.05.23T18:59:29.409 |
fractional days from 2020.01.01 |
||
s |
. | |||||
-11 | s | symbol | `hi |
pointer to an interned string | ||
-128 | error | '"error" |
e: element from anonymous union inside k0
struct
sz: size of element in bytes
n: type (value of K->t
)
c: char code for type
Creating an atomic K object of type t
is straightforward:
K createAtom(I t){
K x=malloc(sizeof*x);
x->t=t; // type
x->r=0; // ref count
return x;}
K createInt(I x){K y=createAtom(-6);y->i=x;return y;}
K createLong(J x){K y=createAtom(-7);y->j=x;return y;} // ...similarly for other types
As you can see above, members of the unnamed union inside k0
can be accessed as if they were members of the outer
struct. Moreover, this works recursively, so even members of the last struct of the union can be accessed directly.
For instance, n
can be accessed directly as x->n
, where x
is of type K
- a pointer to struct k0
.
These are called anonymous structs/unions and were
standardized in C11 although
gcc supported them as a language extension before that.
To store simple lists (i.e. lists of atoms of homogeneous type), the last member of the union is used - the struct
with members n
and G0
. While allocating memory for a K
object, sizeof(list) bytes are added in for the
dynamically sized array. This is known as the struct hack1.
Code to create a simple list of size n
and type t
would look something like this:
size_t listMallocSize(I t,J n){
size_t x=sizeof*(K)0;
if(n>0)
switch(t){ // type of a simple list of atoms of type t is negative t
case 7:x+=n*sizeof(J);break; // simple list of atoms of int (t=-7) having t=7
case 6:x+=n*sizeof(I);break;
// ... and so on for other types
}
return x-1; // -1 to account for G0[1]
}
K createSimpleList(I t,J n){
K x=malloc(listMallocSize(t,n));
x->t=t;
x->r=0;
x->n=n; // vector size n
return x;}
// K intVector = createSimpleList(6,100);
// access yth element of intVector as ((I*)(intVector->G0))[y] - k.h has macros for this
The alternative instead of this hack would be to make G0
a pointer, which would have some drawbacks:
- it would require 2
malloc
/free
calls for allocation/de-allocation of the struct and list. - accessing the list would require a dereference each time.
After simple lists, we have the general list - a list of any heterogeneous types.
In this case, G0
would be a list of K
objects, i.e. pointers to k0
structs each of which could be of a different
type. As we’d expect, operations on this type of list would be slower due to the multiple pointer dereferences and
non-contiguous layout of the data. This would also be the rationale behind why k/q tries to convert any general
list to a simple list automatically - they are much faster and able to utilize
SSE/AVX instructions.
Further along, there are dictionaries, which are simply a pair of conforming lists, i.e. a pair of lists of equal size.
The G0
array, similar to a general list would now be a list of 2 K
objects each of which is a general/simple list -
the key list (access using ((K*)(x->G0))[0]
) and the value list (((K*)(x->G0))[1]
).
This ultimately leads us to the table - the most important data structure in k/q. A table is a flipped column dictionary. A column dictionary is a dictionary with atomic symbol values as keys and simple/general lists of equal sizes as the values. The flipped part means that this dictionary is transposed (figuratively).
/ a column dictionary with list of symbols as the key list and
/ list of conforming list of longs as the value list
q)show d:`a`b`c!(1 2;3 4;5 6)
a| 1 2
b| 3 4
c| 5 6
q)show t:flip d / flipped column dictionary - a table
a b c
-----
1 3 5
2 4 6
This fact is also reflected in it the table’s internal representation.
The K
object for a table is a pointer to its column dictionary (using the self-referencing pointer struct k0 *k
).
This also shows us why taking the transpose (flip
) of a list of conforming lists or a column dictionary is very cheap.
In a similar vein, some of the other data structures are also constructed using the same k0
struct:
- keyed tables - dictionary with key and value as simple tables
- splayed table - dictionary with key as list of columns and value as its file path symbol
- partitioned table - dictionary with key as list of columns and value as a table name symbol
All these diverse data structures supported by a single clever struct - pretty cool IMO.
-
the size of the last array member of a struct can be kept unspecified in C99, which is known as flexible array members ↩