Difference between revisions of "Awk"
Jump to navigation
Jump to search
(Created page with ";awk '{ for(i = 1; i <= NF; i++) { print $i; } }' :Itterate over all fields") |
|||
(24 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
− | ;awk '{ for(i = 1; i <= NF; i++) { print $i; } }' | + | [[Category:Linux/Unix]] |
− | : | + | ==Basics== |
+ | |||
+ | ===Standard variables=== | ||
+ | *NF Number of fields on the line | ||
+ | *FS The field separator (default is whitespace) | ||
+ | *OFS The output field separator. Set like '{BEGIN OFS=","} {<other code>}' | ||
+ | *$0 The entire line | ||
+ | *$1 First field in line | ||
+ | *NR The number of the current line | ||
+ | *ARGV The arguments (files) passed to awk (ARGV[2] has the name of the 2nd file on the commandline) | ||
+ | *ARGIND The argument number currently processed | ||
+ | |||
+ | |||
+ | ;awk -F"," -v VAR=<value> '{codeblock}' | ||
+ | :In awk: Use "," as field-separator and set VAR to <value>. With -v you can pass variables from the shell to the awk program. | ||
+ | |||
+ | ;awk -f <awkfile> <file> | ||
+ | :Parse <file> with the code in <awkfile>. If - is provided as <file> the text to parse can be given after pressing return, close the input with CTRL-D | ||
+ | |||
+ | ;awk '{ for(i = 1; i <= NF; i++) { print i,$i; } }' | ||
+ | :Iterate over all fields | ||
+ | |||
+ | ;<code>a=length(field)</code> | ||
+ | :Get the length of a field. | ||
+ | |||
+ | ;awk '{if (ARGIND==2) {codeblock}' | ||
+ | :Execute codeblock if the current line is in the second file passed to awk | ||
+ | |||
+ | ==Arrays== | ||
+ | Arrays can have named keys, and work a bit like a [[Python:DataTypes#Dirctionary_or_dict| python dict]] or [[Perl]] hashes. | ||
+ | |||
+ | Output each string found in the array <ar> only 1 time. | ||
+ | <syntaxhighlight lang=awk> | ||
+ | { | ||
+ | split($2,ar,"/") | ||
+ | dict[ar[1]] = 1 | ||
+ | } END { | ||
+ | for (key in dict) | ||
+ | { | ||
+ | OUT=OUT" "key | ||
+ | } | ||
+ | print OUT | ||
+ | } | ||
+ | </syntaxhighlight> | ||
+ | |||
+ | |||
+ | ==Strings== | ||
+ | |||
+ | ;length(<sting>) | ||
+ | :The length of <string> | ||
+ | |||
+ | ;substr(<string>,<start>,<num>) | ||
+ | :From <sting> return <num> characters, starting from <start> | ||
+ | ;n=split(var,ARR,<fs>) | ||
+ | :Split var in array ARR, n holds the number of elements in ARR, <fs> is the field separator, if not given the variable FS is used as field separator (default white space). | ||
+ | Default action of awk for read line is: | ||
+ | <syntaxhighlight lang=awk> | ||
+ | NF=split(var,ARR," ") | ||
+ | NR++ | ||
+ | $0=var | ||
+ | for ( i=1 ; i<=NF ; i++ ) { | ||
+ | $i=ARR[i] | ||
+ | } | ||
+ | </syntaxhighlight> | ||
+ | ;gsub(<regexp>,<string>,<variable>) | ||
+ | :Replace <regexp> with <string> in <variable>. Return number of replacements. | ||
+ | :<variable> is modified. | ||
+ | |||
+ | ;<code>printf "%-10s %05d\n", $1, $2</code> | ||
+ | :Format output like in [[Python:Strings#Advanced]] | ||
+ | |||
+ | ==Calculations== | ||
+ | |||
+ | Print average of field 4 for all records in <file> containing 'GW' | ||
+ | <syntaxhighlight lang=awk> | ||
+ | awk 'BEGIN {} | ||
+ | /GW/ {GW+=$4} | ||
+ | END {print GW/NR}' <file> | ||
+ | </syntaxhighlight> | ||
+ | |||
+ | ==Matching== | ||
+ | ;/<pattern>/ {codeblock} | ||
+ | :Execute code if the current line matches <pattern> | ||
+ | |||
+ | ;var ~ /<pattern>/ {codeblock} | ||
+ | :Execute code if var matches <pattern> | ||
+ | :!~ to negate the match | ||
+ | |||
+ | ==Control statements== | ||
+ | ;|| | ||
+ | :or | ||
+ | ;&& | ||
+ | :and | ||
+ | |||
+ | ;If ( VAR == <value> ) { codeblock } else { codeblock } | ||
+ | :The if, then, else construction | ||
+ | ;if (n) | ||
+ | ;if (n != 0) | ||
+ | :If n is not equal to zero | ||
+ | ;(n<5)*2 | ||
+ | :if n < 5 this returns 10, else 0 | ||
+ | ;for (var in ARR) { print ARR[var] } | ||
+ | :Read all indexes of ARR in arbitrary order. | ||
+ | ;if (var in ARR) { print ARR[var] } | ||
+ | :Check if index var is in ARR | ||
+ | ;if (ARR[var] == "" ) | ||
+ | :Check if index var has a value | ||
+ | ;for (init;test;incr) { codeblock } | ||
+ | :Loop. Start with init, as long as test is true, execute codeblock, after each loop execute incr. | ||
+ | <syntaxhighlight lang=awk> | ||
+ | for (num=10;num<=100:num++) { | ||
+ | print num | ||
+ | } | ||
+ | </syntaxhighlight> | ||
+ | ;function NAME (par,par,..) | ||
+ | :Create a function. Parameters of the functions are local. All local variables should be defined as parameter to avoid overwriting a global variable. Overwriting a global variable on the other hand is a way to return results of the function. | ||
+ | ;return <value> | ||
+ | :End of function and give it a return value | ||
+ | |||
+ | ==Date/time== | ||
+ | today=`gawk 'END{print strftime("%Y%m%d",systime())}' < /dev/null` | ||
+ | yesterday=`gawk 'END{print strftime("%Y%m%d",systime()-24*60*60)}' < /dev/null` | ||
+ | |||
+ | ==Magic== | ||
+ | ;gawk '{if (NR%3 == 0) {print p$0;p=""}else{p=p$0}}' | ||
+ | :Combine lines per 3 (Basic code found on stackoverflow [https://stackoverflow.com/questions/3194534/joining-two-consecutive-lines-using-awk-or-sed]) |
Latest revision as of 16:50, 13 November 2023
Basics
Standard variables
- NF Number of fields on the line
- FS The field separator (default is whitespace)
- OFS The output field separator. Set like '{BEGIN OFS=","} {<other code>}'
- $0 The entire line
- $1 First field in line
- NR The number of the current line
- ARGV The arguments (files) passed to awk (ARGV[2] has the name of the 2nd file on the commandline)
- ARGIND The argument number currently processed
- awk -F"," -v VAR=<value> '{codeblock}'
- In awk: Use "," as field-separator and set VAR to <value>. With -v you can pass variables from the shell to the awk program.
- awk -f <awkfile> <file>
- Parse <file> with the code in <awkfile>. If - is provided as <file> the text to parse can be given after pressing return, close the input with CTRL-D
- awk '{ for(i = 1; i <= NF; i++) { print i,$i; } }'
- Iterate over all fields
a=length(field)
- Get the length of a field.
- awk '{if (ARGIND==2) {codeblock}'
- Execute codeblock if the current line is in the second file passed to awk
Arrays
Arrays can have named keys, and work a bit like a python dict or Perl hashes.
Output each string found in the array <ar> only 1 time.
{
split($2,ar,"/")
dict[ar[1]] = 1
} END {
for (key in dict)
{
OUT=OUT" "key
}
print OUT
}
Strings
- length(<sting>)
- The length of <string>
- substr(<string>,<start>,<num>)
- From <sting> return <num> characters, starting from <start>
- n=split(var,ARR,<fs>)
- Split var in array ARR, n holds the number of elements in ARR, <fs> is the field separator, if not given the variable FS is used as field separator (default white space).
Default action of awk for read line is:
NF=split(var,ARR," ")
NR++
$0=var
for ( i=1 ; i<=NF ; i++ ) {
$i=ARR[i]
}
- gsub(<regexp>,<string>,<variable>)
- Replace <regexp> with <string> in <variable>. Return number of replacements.
- <variable> is modified.
printf "%-10s %05d\n", $1, $2
- Format output like in Python:Strings#Advanced
Calculations
Print average of field 4 for all records in <file> containing 'GW'
awk 'BEGIN {}
/GW/ {GW+=$4}
END {print GW/NR}' <file>
Matching
- /<pattern>/ {codeblock}
- Execute code if the current line matches <pattern>
- var ~ /<pattern>/ {codeblock}
- Execute code if var matches <pattern>
- !~ to negate the match
Control statements
- ||
- or
- &&
- and
- If ( VAR == <value> ) { codeblock } else { codeblock }
- The if, then, else construction
- if (n)
- if (n != 0)
- If n is not equal to zero
- (n<5)*2
- if n < 5 this returns 10, else 0
- for (var in ARR) { print ARR[var] }
- Read all indexes of ARR in arbitrary order.
- if (var in ARR) { print ARR[var] }
- Check if index var is in ARR
- if (ARR[var] == "" )
- Check if index var has a value
- for (init;test;incr) { codeblock }
- Loop. Start with init, as long as test is true, execute codeblock, after each loop execute incr.
for (num=10;num<=100:num++) {
print num
}
- function NAME (par,par,..)
- Create a function. Parameters of the functions are local. All local variables should be defined as parameter to avoid overwriting a global variable. Overwriting a global variable on the other hand is a way to return results of the function.
- return <value>
- End of function and give it a return value
Date/time
today=`gawk 'END{print strftime("%Y%m%d",systime())}' < /dev/null` yesterday=`gawk 'END{print strftime("%Y%m%d",systime()-24*60*60)}' < /dev/null`
Magic
- gawk '{if (NR%3 == 0) {print p$0;p=""}else{p=p$0}}'
- Combine lines per 3 (Basic code found on stackoverflow [1])