Awk

From wiki
Jump to navigation Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Basics

Standard variables

  • NF Number of fields on the line
  • FS The field separator (default is whitespace)
  • OFS The output field separator. Set like '{BEGIN OFS=","} {<other code>}'
  • $0 The entire line
  • $1 First field in line
  • NR The number of the current line
  • ARGV The arguments (files) passed to awk (ARGV[2] has the name of the 2nd file on the commandline)
  • ARGIND The argument number currently processed


awk -F"," -v VAR=<value> '{codeblock}'
In awk: Use "," as field-separator and set VAR to <value>. With -v you can pass variables from the shell to the awk program.
awk -f <awkfile> <file>
Parse <file> with the code in <awkfile>. If - is provided as <file> the text to parse can be given after pressing return, close the input with CTRL-D
awk '{ for(i = 1; i <= NF; i++) { print i,$i; } }'
Iterate over all fields
a=length(field)
Get the length of a field.
awk '{if (ARGIND==2) {codeblock}'
Execute codeblock if the current line is in the second file passed to awk

Arrays

Arrays can have named keys, and work a bit like a python dict or Perl hashes.

Output each string found in the array <ar> only 1 time.

{
 split($2,ar,"/")
 dict[ar[1]] = 1
} END {
 for (key in dict) 
 {
  OUT=OUT" "key
 }
 print OUT
}


Strings

length(<sting>)
The length of <string>
substr(<string>,<start>,<num>)
From <sting> return <num> characters, starting from <start>
n=split(var,ARR,<fs>)
Split var in array ARR, n holds the number of elements in ARR, <fs> is the field separator, if not given the variable FS is used as field separator (default white space).

Default action of awk for read line is:

NF=split(var,ARR," ")
NR++
$0=var
for ( i=1 ; i<=NF ; i++ ) {
 $i=ARR[i]
}
gsub(<regexp>,<string>,<variable>)
Replace <regexp> with <string> in <variable>. Return number of replacements.
<variable> is modified.
printf "%-10s %05d\n", $1, $2
Format output like in Python:Strings#Advanced

Calculations

Print average of field 4 for all records in <file> containing 'GW'

awk 'BEGIN {}
/GW/ {GW+=$4}
END {print GW/NR}' <file>

Matching

/<pattern>/ {codeblock}
Execute code if the current line matches <pattern>
var ~ /<pattern>/ {codeblock}
Execute code if var matches <pattern>
!~ to negate the match

Control statements

||
or
&&
and
If ( VAR == <value> ) { codeblock } else { codeblock }
The if, then, else construction
if (n)
if (n != 0)
If n is not equal to zero
(n<5)*2
if n < 5 this returns 10, else 0
for (var in ARR) { print ARR[var] }
Read all indexes of ARR in arbitrary order.
if (var in ARR) { print ARR[var] }
Check if index var is in ARR
if (ARR[var] == "" )
Check if index var has a value
for (init;test;incr) { codeblock }
Loop. Start with init, as long as test is true, execute codeblock, after each loop execute incr.
for (num=10;num<=100:num++) {
print num
}
function NAME (par,par,..)
Create a function. Parameters of the functions are local. All local variables should be defined as parameter to avoid overwriting a global variable. Overwriting a global variable on the other hand is a way to return results of the function.
return <value>
End of function and give it a return value

Date/time

today=`gawk 'END{print strftime("%Y%m%d",systime())}' < /dev/null`
yesterday=`gawk 'END{print strftime("%Y%m%d",systime()-24*60*60)}' < /dev/null`

Magic

gawk '{if (NR%3 == 0) {print p$0;p=""}else{p=p$0}}'
Combine lines per 3 (Basic code found on stackoverflow [1])