Difference between revisions of "Awk"

Latest revision as of 16:50, 13 November 2023

Basics

Standard variables

NF Number of fields on the line
FS The field separator (default is whitespace)
OFS The output field separator. Set like '{BEGIN OFS=","} {<other code>}'
$0 The entire line
$1 First field in line
NR The number of the current line
ARGV The arguments (files) passed to awk (ARGV[2] has the name of the 2nd file on the commandline)
ARGIND The argument number currently processed

awk -F"," -v VAR=<value> '{codeblock}': In awk: Use "," as field-separator and set VAR to <value>. With -v you can pass variables from the shell to the awk program.

awk -f <awkfile> <file>: Parse <file> with the code in <awkfile>. If - is provided as <file> the text to parse can be given after pressing return, close the input with CTRL-D

awk '{ for(i = 1; i <= NF; i++) { print i,$i; } }': Iterate over all fields

a=length(field): Get the length of a field.

awk '{if (ARGIND==2) {codeblock}': Execute codeblock if the current line is in the second file passed to awk

Arrays

Arrays can have named keys, and work a bit like a python dict or Perl hashes.

Output each string found in the array <ar> only 1 time.

{
 split($2,ar,"/")
 dict[ar[1]] = 1
} END {
 for (key in dict) 
 {
  OUT=OUT" "key
 }
 print OUT
}

Strings

length(<sting>): The length of <string>

substr(<string>,<start>,<num>): From <sting> return <num> characters, starting from <start>
n=split(var,ARR,<fs>): Split var in array ARR, n holds the number of elements in ARR, <fs> is the field separator, if not given the variable FS is used as field separator (default white space).

Default action of awk for read line is:

NF=split(var,ARR," ")
NR++
$0=var
for ( i=1 ; i<=NF ; i++ ) {
 $i=ARR[i]
}

gsub(<regexp>,<string>,<variable>): Replace <regexp> with <string> in <variable>. Return number of replacements.; <variable> is modified.

printf "%-10s %05d\n", $1, $2: Format output like in Python:Strings#Advanced

Calculations

Print average of field 4 for all records in <file> containing 'GW'

awk 'BEGIN {}
/GW/ {GW+=$4}
END {print GW/NR}' <file>

Matching

/<pattern>/ {codeblock}: Execute code if the current line matches <pattern>

var ~ /<pattern>/ {codeblock}: Execute code if var matches <pattern>; !~ to negate the match

Control statements

||: or
&&: and

If ( VAR == <value> ) { codeblock } else { codeblock }: The if, then, else construction
if (n)
if (n != 0): If n is not equal to zero
(n<5)*2: if n < 5 this returns 10, else 0
for (var in ARR) { print ARR[var] }: Read all indexes of ARR in arbitrary order.
if (var in ARR) { print ARR[var] }: Check if index var is in ARR
if (ARR[var] == "" ): Check if index var has a value
for (init;test;incr) { codeblock }: Loop. Start with init, as long as test is true, execute codeblock, after each loop execute incr.

for (num=10;num<=100:num++) {
print num
}

function NAME (par,par,..): Create a function. Parameters of the functions are local. All local variables should be defined as parameter to avoid overwriting a global variable. Overwriting a global variable on the other hand is a way to return results of the function.
return <value>: End of function and give it a return value

Date/time

today=`gawk 'END{print strftime("%Y%m%d",systime())}' < /dev/null`
yesterday=`gawk 'END{print strftime("%Y%m%d",systime()-24*60*60)}' < /dev/null`

Magic

gawk '{if (NR%3 == 0) {print p$0;p=""}else{p=p$0}}': Combine lines per 3 (Basic code found on stackoverflow [1])

@@ Line 1: / Line 1: @@
-;awk '{ for(i = 1; i <= NF; i++) { print $i; } }'
+[[Category:Linux/Unix]]
-:Itterate over all fields
+==Basics==
+===Standard variables===
+*NF Number of fields on the line
+*FS The field separator (default is whitespace)
+*OFS The output field separator. Set like '{BEGIN OFS=","} {<other code>}'
+*$0 The entire line
+*$1 First field in line
+*NR The number of the current line
+*ARGV The arguments (files) passed to awk (ARGV[2] has the name of the 2nd file on the commandline)
+*ARGIND The argument number currently processed
+;awk -F"," -v VAR=<value> '{codeblock}'
+:In awk: Use "," as field-separator and set VAR to <value>. With -v you can pass variables from the shell to the awk program.
+;awk -f <awkfile>  <file>
+:Parse <file> with the code in <awkfile>. If - is provided as <file> the text to parse can be given after pressing return, close the input with CTRL-D
+;awk '{ for(i = 1; i <= NF; i++) { print i,$i; } }'
+:Iterate over all fields
+;<code>a=length(field)</code>
+:Get the length of a field.
+;awk '{if (ARGIND==2) {codeblock}'
+:Execute codeblock if the current line is in the second file passed to awk
+==Arrays==
+Arrays can have named keys, and work a bit like a [[Python:DataTypes#Dirctionary_or_dict| python dict]] or [[Perl]] hashes.
+Output each string found in the array <ar> only 1 time.
+<syntaxhighlight lang=awk>
+{
+ split($2,ar,"/")
+ dict[ar[1]] = 1
+} END {
+ for (key in dict)
+ {
+  OUT=OUT" "key
+ }
+ print OUT
+}
+</syntaxhighlight>
+==Strings==
+;length(<sting>)
+:The length of <string>
+;substr(<string>,<start>,<num>)
+:From <sting> return <num> characters, starting from <start>
+;n=split(var,ARR,<fs>)
+:Split var in array ARR, n holds the number of elements in ARR, <fs> is the field separator, if not given the variable FS is used as field separator (default white space).
+Default action of awk for read line is:
+<syntaxhighlight lang=awk>
+NF=split(var,ARR," ")
+NR++
+$0=var
+for ( i=1 ; i<=NF ; i++ ) {
+ $i=ARR[i]
+}
+</syntaxhighlight>
+;gsub(<regexp>,<string>,<variable>)
+:Replace <regexp> with <string> in <variable>. Return number of replacements.
+:<variable> is modified.
+;<code>printf "%-10s %05d\n", $1, $2</code>
+:Format output like in [[Python:Strings#Advanced]]
+==Calculations==
+Print average of field 4 for all records in <file> containing 'GW'
+<syntaxhighlight lang=awk>
+awk 'BEGIN {}
+/GW/ {GW+=$4}
+END {print GW/NR}' <file>
+</syntaxhighlight>
+==Matching==
+;/<pattern>/ {codeblock}
+:Execute code if the current line matches <pattern>
+;var ~ /<pattern>/ {codeblock}
+:Execute code if var matches <pattern>
+:!~ to negate the match
+==Control statements==
+;||
+:or
+;&&
+:and
+;If ( VAR == <value> ) { codeblock  } else { codeblock }
+:The if, then, else construction
+;if (n)
+;if (n != 0)
+:If n is not equal to zero
+;(n<5)*2
+:if n < 5 this returns 10, else 0
+;for (var in ARR) { print ARR[var] }
+:Read all indexes of ARR in arbitrary order.
+;if (var in ARR) { print ARR[var] }
+:Check if index var is in ARR
+;if (ARR[var] == "" )
+:Check if index var has a value
+;for (init;test;incr) { codeblock }
+:Loop. Start with init, as long as test is true, execute codeblock, after each loop execute incr.
+<syntaxhighlight lang=awk>
+for (num=10;num<=100:num++) {
+print num
+}
+</syntaxhighlight>
+;function NAME (par,par,..)
+:Create a function. Parameters of the functions are local. All local variables should be defined as parameter to avoid overwriting a global variable. Overwriting a global variable on the other hand is a way to return results of the function.
+;return <value>
+:End of function and give it a return value
+==Date/time==
+ today=`gawk 'END{print strftime("%Y%m%d",systime())}' < /dev/null`
+ yesterday=`gawk 'END{print strftime("%Y%m%d",systime()-24*60*60)}' < /dev/null`
+==Magic==
+;gawk '{if (NR%3 == 0) {print p$0;p=""}else{p=p$0}}'
+:Combine lines per 3 (Basic code found on stackoverflow [https://stackoverflow.com/questions/3194534/joining-two-consecutive-lines-using-awk-or-sed])