grokking awk
This HN post got me to brush up my awk beyond the basic {print $2,$3}
to extract fields from csv-like files.
Here are some of my notes from The AWK Programming Language by Alfred Aho, Peter Weinberger and Brian Kernighan.
- sequence of pattern-action statements
pattern { action }
- 2 types of data: numbers and strings.
- fields in current input line:
$1
,$2
…$0
is the entire line. - if there is no pattern, perform
action
on every line - if there is no action, print lines that match the
pattern
$ echo -e 'alice 20\nbob 30\niyer 24' | awk '$2>20' # all people aged >20 bob 30 iyer 24
NF
- number of fields in current line$ echo 'a b c d' | awk '{print NF, $1, $NF}' # print field count, the 1st and last field 4 a d
NR
- line number$ echo -e 'alice 20\nbob 30' | awk '{print NR, $0}' # prefix with line number 1 alice 20 2 bob 30
- simple arithmetic
$ echo 'alice 5.50 22' | awk '{print $1, $2 * $3}' alice 121
- use
printf
like inC
$ echo 'alice 5.50 22' | awk '{printf("%s $%.2f\n", $1, $2 * $3)}' alice $121.00
- string/regex matching
$ echo -e 'susan 20\nbob 24\nsusie 12' | awk '$1=="susie"' susie 12 $ echo -e 'susan 20\nbob 24\nsusie 12' | awk '$1 ~ /^su/' # all lines where the 1st field starts with "su" susan 20 susie 12
-
use
||
,&&
and!
as inC
BEGIN
,END
- special patterns that matches before first line is read and after last line has been processed.$ echo -e 'susan,20\nbob,24\nsusie,12' | awk 'BEGIN {print "name,age"} {print}' # add header to csv file name,age susan,20 bob,24 susie,12 $ echo -e 'susan 20\nbob 24\nsusie 12' | awk '{sum=sum+$2} END{print sum/NR}' # average age 18.6667
if
-else
,while
andfor
loops similar toC
$ cat /tmp/tmp name,age Alice,20 Bob,30 $ awk -F, '{for(i=1;i<=NF;i+=1)if(NR==1)a[i]=$i;else a[i]=a[i] FS $i}END{for(i=1;i<=NF;i+=1)print a[i]}' /tmp/tmp name,Alice,Bob age,20,30
The one-liner above transposes the input csv file, expanded here for readability:
{ # since this action block does NOT have a pattern, it will execute for all lines for (i = 1; i <= NF; i += 1) if (NR == 1) # if this is the first line a[i] = $i else a[i] = a[i] FS $i # concatenate current cell to the ith "column" string separated by the Field Separator } END { # this action block gets executed after all lines have been processed for (i = 1; i <= NF; i += 1) print a[i] }
- use
length
for finding string length$ echo -e 'name age\nalice 20\nbob 30' | awk '{print length($0)}' # print length of each line 8 8 6