Difference between revisions of "Awk"

From wiki
Jump to navigation Jump to search
 
(23 intermediate revisions by the same user not shown)
Line 1: Line 1:
;awk '{ for(i = 1; i <= NF; i++) { print $i; } }'
+
[[Category:Linux/Unix]]
:Itterate over all fields
+
==Basics==
  
 +
===Standard variables===
 +
*NF Number of fields on the line
 +
*FS The field separator (default is whitespace)
 +
*OFS The output field separator. Set like '{BEGIN OFS=","} {<other code>}'
 +
*$0 The entire line
 +
*$1 First field in line
 +
*NR The number of the current line
 +
*ARGV The arguments (files) passed to awk (ARGV[2] has the name of the 2nd file on the commandline)
 +
*ARGIND The argument number currently processed
  
==String manipulation==
+
 
 +
;awk -F"," -v VAR=<value> '{codeblock}'
 +
:In awk: Use "," as field-separator and set VAR to <value>. With -v you can pass variables from the shell to the awk program.
 +
 
 +
;awk -f <awkfile>  <file>
 +
:Parse <file> with the code in <awkfile>. If - is provided as <file> the text to parse can be given after pressing return, close the input with CTRL-D
 +
 
 +
;awk '{ for(i = 1; i <= NF; i++) { print i,$i; } }'
 +
:Iterate over all fields
 +
 
 +
;<code>a=length(field)</code>
 +
:Get the length of a field.
 +
 
 +
;awk '{if (ARGIND==2) {codeblock}'
 +
:Execute codeblock if the current line is in the second file passed to awk
 +
 
 +
==Arrays==
 +
Arrays can have named keys, and work a bit like a [[Python:DataTypes#Dirctionary_or_dict| python dict]] or [[Perl]] hashes.
 +
 
 +
Output each string found in the array <ar> only 1 time.
 +
<syntaxhighlight lang=awk>
 +
{
 +
split($2,ar,"/")
 +
dict[ar[1]] = 1
 +
} END {
 +
for (key in dict)
 +
{
 +
  OUT=OUT" "key
 +
}
 +
print OUT
 +
}
 +
</syntaxhighlight>
 +
 
 +
 
 +
==Strings==
 +
 
 +
;length(<sting>)
 +
:The length of <string>
  
 
;substr(<string>,<start>,<num>)
 
;substr(<string>,<start>,<num>)
 
:From <sting> return <num> characters, starting from <start>
 
:From <sting> return <num> characters, starting from <start>
 
;n=split(var,ARR,<fs>)
 
;n=split(var,ARR,<fs>)
:Split var in array ARR, n holds the number of elements in ARR, <fs> is the field separator, if not given the variable FS is used as field separator (default white space). :Default action of awk for read line is NF=split(var,ARR," ")
+
:Split var in array ARR, n holds the number of elements in ARR, <fs> is the field separator, if not given the variable FS is used as field separator (default white space).  
 +
Default action of awk for read line is:
 
<syntaxhighlight lang=awk>
 
<syntaxhighlight lang=awk>
 +
NF=split(var,ARR," ")
 
NR++
 
NR++
 
$0=var
 
$0=var
Line 19: Line 67:
 
:Replace <regexp> with <string> in <variable>. Return number of replacements.
 
:Replace <regexp> with <string> in <variable>. Return number of replacements.
 
:<variable> is modified.
 
:<variable> is modified.
 +
 +
;<code>printf "%-10s %05d\n", $1, $2</code>
 +
:Format output like in [[Python:Strings#Advanced]]
  
 
==Calculations==
 
==Calculations==
Line 28: Line 79:
 
END {print GW/NR}' <file>
 
END {print GW/NR}' <file>
 
</syntaxhighlight>
 
</syntaxhighlight>
 +
 +
==Matching==
 +
;/<pattern>/ {codeblock}
 +
:Execute code if the current line matches <pattern>
 +
 +
;var ~ /<pattern>/ {codeblock}
 +
:Execute code if var matches <pattern>
 +
:!~ to negate the match
 +
 +
==Control statements==
 +
;||
 +
:or
 +
;&&
 +
:and
 +
 +
;If ( VAR == <value> ) { codeblock  } else { codeblock }
 +
:The if, then, else construction
 +
;if (n)
 +
;if (n != 0)
 +
:If n is not equal to zero
 +
;(n<5)*2
 +
:if n < 5 this returns 10, else 0
 +
;for (var in ARR) { print ARR[var] }
 +
:Read all indexes of ARR in arbitrary order.
 +
;if (var in ARR) { print ARR[var] }
 +
:Check if index var is in ARR
 +
;if (ARR[var] == "" )
 +
:Check if index var has a value
 +
;for (init;test;incr) { codeblock }
 +
:Loop. Start with init, as long as test is true, execute codeblock, after each loop execute incr.
 +
<syntaxhighlight lang=awk>
 +
for (num=10;num<=100:num++) {
 +
print num
 +
}
 +
</syntaxhighlight>
 +
;function NAME (par,par,..)
 +
:Create a function. Parameters of the functions are local. All local variables should be defined as parameter to avoid overwriting a global variable. Overwriting a global variable on the other hand is a way to return results of the function.
 +
;return <value>
 +
:End of function and give it a return value
 +
 +
==Date/time==
 +
today=`gawk 'END{print strftime("%Y%m%d",systime())}' < /dev/null`
 +
yesterday=`gawk 'END{print strftime("%Y%m%d",systime()-24*60*60)}' < /dev/null`
 +
 +
==Magic==
 +
;gawk '{if (NR%3 == 0) {print p$0;p=""}else{p=p$0}}'
 +
:Combine lines per 3 (Basic code found on stackoverflow [https://stackoverflow.com/questions/3194534/joining-two-consecutive-lines-using-awk-or-sed])

Latest revision as of 16:50, 13 November 2023

Basics

Standard variables

  • NF Number of fields on the line
  • FS The field separator (default is whitespace)
  • OFS The output field separator. Set like '{BEGIN OFS=","} {<other code>}'
  • $0 The entire line
  • $1 First field in line
  • NR The number of the current line
  • ARGV The arguments (files) passed to awk (ARGV[2] has the name of the 2nd file on the commandline)
  • ARGIND The argument number currently processed


awk -F"," -v VAR=<value> '{codeblock}'
In awk: Use "," as field-separator and set VAR to <value>. With -v you can pass variables from the shell to the awk program.
awk -f <awkfile> <file>
Parse <file> with the code in <awkfile>. If - is provided as <file> the text to parse can be given after pressing return, close the input with CTRL-D
awk '{ for(i = 1; i <= NF; i++) { print i,$i; } }'
Iterate over all fields
a=length(field)
Get the length of a field.
awk '{if (ARGIND==2) {codeblock}'
Execute codeblock if the current line is in the second file passed to awk

Arrays

Arrays can have named keys, and work a bit like a python dict or Perl hashes.

Output each string found in the array <ar> only 1 time.

{
 split($2,ar,"/")
 dict[ar[1]] = 1
} END {
 for (key in dict) 
 {
  OUT=OUT" "key
 }
 print OUT
}


Strings

length(<sting>)
The length of <string>
substr(<string>,<start>,<num>)
From <sting> return <num> characters, starting from <start>
n=split(var,ARR,<fs>)
Split var in array ARR, n holds the number of elements in ARR, <fs> is the field separator, if not given the variable FS is used as field separator (default white space).

Default action of awk for read line is:

NF=split(var,ARR," ")
NR++
$0=var
for ( i=1 ; i<=NF ; i++ ) {
 $i=ARR[i]
}
gsub(<regexp>,<string>,<variable>)
Replace <regexp> with <string> in <variable>. Return number of replacements.
<variable> is modified.
printf "%-10s %05d\n", $1, $2
Format output like in Python:Strings#Advanced

Calculations

Print average of field 4 for all records in <file> containing 'GW'

awk 'BEGIN {}
/GW/ {GW+=$4}
END {print GW/NR}' <file>

Matching

/<pattern>/ {codeblock}
Execute code if the current line matches <pattern>
var ~ /<pattern>/ {codeblock}
Execute code if var matches <pattern>
!~ to negate the match

Control statements

||
or
&&
and
If ( VAR == <value> ) { codeblock } else { codeblock }
The if, then, else construction
if (n)
if (n != 0)
If n is not equal to zero
(n<5)*2
if n < 5 this returns 10, else 0
for (var in ARR) { print ARR[var] }
Read all indexes of ARR in arbitrary order.
if (var in ARR) { print ARR[var] }
Check if index var is in ARR
if (ARR[var] == "" )
Check if index var has a value
for (init;test;incr) { codeblock }
Loop. Start with init, as long as test is true, execute codeblock, after each loop execute incr.
for (num=10;num<=100:num++) {
print num
}
function NAME (par,par,..)
Create a function. Parameters of the functions are local. All local variables should be defined as parameter to avoid overwriting a global variable. Overwriting a global variable on the other hand is a way to return results of the function.
return <value>
End of function and give it a return value

Date/time

today=`gawk 'END{print strftime("%Y%m%d",systime())}' < /dev/null`
yesterday=`gawk 'END{print strftime("%Y%m%d",systime()-24*60*60)}' < /dev/null`

Magic

gawk '{if (NR%3 == 0) {print p$0;p=""}else{p=p$0}}'
Combine lines per 3 (Basic code found on stackoverflow [1])