Difference between revisions of "Awk"
Jump to navigation
Jump to search
m (→Basics) |
m (→Basics) |
||
Line 2: | Line 2: | ||
==Basics== | ==Basics== | ||
− | ;awk -F"," -v VAR=<value> '{ | + | ;awk -F"," -v VAR=<value> '{codeblock}' |
:In awk: Use "," as field-separator and set VAR to <value>. With -v you can pass variables from the shell to the awk program. | :In awk: Use "," as field-separator and set VAR to <value>. With -v you can pass variables from the shell to the awk program. | ||
Revision as of 14:26, 27 July 2020
Basics
- awk -F"," -v VAR=<value> '{codeblock}'
- In awk: Use "," as field-separator and set VAR to <value>. With -v you can pass variables from the shell to the awk program.
- awk '{ for(i = 1; i <= NF; i++) { print i,$i; } }'
- Iterate over all fields
Standard variables
- NF Number of fields on the line
- FS The field separator (default is whitespace)
- $0 The entire line
- $1 First field in line
a=length(field)
- Get the length of a field.
Arrays
Arrays can have named keys, and work a bit like a python dict or Perl hashes.
Output each string found in the array <ar> only 1 time.
{
split($2,ar,"/")
dict[ar[1]] = 1
} END {
for (key in dict)
{
OUT=OUT" "key
}
print OUT
}
String manipulation
- substr(<string>,<start>,<num>)
- From <sting> return <num> characters, starting from <start>
- n=split(var,ARR,<fs>)
- Split var in array ARR, n holds the number of elements in ARR, <fs> is the field separator, if not given the variable FS is used as field separator (default white space).
Default action of awk for read line is:
NF=split(var,ARR," ")
NR++
$0=var
for ( i=1 ; i<=NF ; i++ ) {
$i=ARR[i]
}
- gsub(<regexp>,<string>,<variable>)
- Replace <regexp> with <string> in <variable>. Return number of replacements.
- <variable> is modified.
printf "%-10s %05d\n", $1, $2
- Format output like in Python:Strings#Advanced
Calculations
Print average of field 4 for all records in <file> containing 'GW'
awk 'BEGIN {}
/GW/ {GW+=$4}
END {print GW/NR}' <file>
Matching
- /<pattern>/ {code}
- Execute code if the current line matches <pattern>
- var ~ /<pattern>/ {code}
- Execute code if var matches <pattern>
- !~ to negate the match
Control statements
- ||
- or
- &&
- and
- If ( VAR == <value> ) { <stat> } else { <stat> }
- The if, then, else construction
- if (n)
- if (n != 0)
- If n is not equal to zero
- (n<5)*2
- if n < 5 this returns 10, else 0
- for (var in ARR) { print ARR[var] }
- Read all indexes of ARR in arbitrary order.
- if (var in ARR) { print ARR[var] }
- Check if index var is in ARR
- if (ARR[var] == "" )
- Check if index var has a value
- for (init;test;incr)
- Loop, start with init, as long as test is true, after the statements incr num
for (num=10;num<=100:num++) {
print num
}
- function NAME (par,par,..)
- Create a function. Parameters of the functions are local. All local variables should be defined as parameter to avoid overwriting a global variable. Overwriting a global variable on the other hand is a way to return results of the function.
- return <value>
- End of function and give it a return value
Magic
- gawk '{if (NR%3 == 0) {print p$0;p=""}else{p=p$0}}'
- Combine lines per 3 (Basic code found on stackoverflow [1])