Reference address: Introduction to awk
awk is an application for processing text files. Almost all Linux systems come with this program. It processes each line of the file in turn and reads each field in it. awk may be the most convenient tool for text files with the same format per line, such as logs and CSV.
1. Basic usage
The basic usage of awk is the following form:
$ awk Action file name
Example:
$ awk '{print $0}' demo.txt
In the above example, demo Txt is the text file to be processed by awk. There is a brace inside the front single quotation mark, which is the processing action print $0 for each line. Where print is the print command and $0 represents the current line. Therefore, the execution result of the above command is to print each line as it is.
Next, let's demonstrate the above example with standard input (stdin).
$ echo 'this is a test' | awk '{print $0}' this is a test
In the above code, print $0 reprints the standard input this is a test.
By default, awk will divide each row into several fields according to spaces and tabs, and use $1, $2, $3 to represent the first field, the second field, the third field, and so on.
$ echo 'this is a test' | awk '{print $3}' a
In the above code, $3 represents the third field a of this is a test.
You can also use the - F parameter to specify the separator if there is a file demo Txt is separated by, as shown below:
root:x:0:0:root:/root:/usr/bin/zsh daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin bin:x:2:2:bin:/bin:/usr/sbin/nologin sys:x:3:3:sys:/dev:/usr/sbin/nologin sync:x:4:65534:sync:/bin:/bin/sync
We extract its first field:
$ awk -F ':' '{ print $1 }' demo.txt root daemon bin sys sync
2. Built in variables
In addition to the $number representing a field, awk provides other variables.
2.1. NF
The variable NF indicates how many fields there are in the current row, so $NF represents the last field.
$ echo 'this is a test' | awk '{print $NF}' test
$(NF-1) represents the penultimate field.
$ awk -F ':' '{print $1, $(NF-1)}' demo.txt root /root daemon /usr/sbin bin /bin sys /dev sync /bin
In the above code, the comma in the print command indicates that when outputting, the two parts are separated by spaces.
2.2. NR
The variable NR indicates which line is currently being processed.
$ awk -F ':' '{print NR ") " $1}' demo.txt 1) root 2) daemon 3) bin 4) sys 5) sync
In the above code, in the print command, if the characters are output as is, they should be placed in double quotation marks.
2.2. more
Other built-in variables of awk are as follows:
FILENAME: Current file name FS: Field separator, which defaults to spaces and tabs. RS: Line separator, used to split each line. The default is line feed. OFS: The separator of the output field, which is used to separate the field when printing. The default is space. ORS: The separator of the output record, which is used to separate records during printing. The default is line feed. OFMT: Format of digital output. The default is%.6g.
3. Built in function
awk also provides some built-in functions to facilitate the processing of raw data.
3.1. toupper()
The function toupper() converts characters to uppercase.
$ awk -F ':' '{ print toupper($1) }' demo.txt ROOT DAEMON BIN SYS SYNC
In the above code, the first field is output in uppercase.
3.2. more
Other common functions are as follows:
tolower(): Convert characters to lowercase. length(): Returns the length of the string. substr(): Returns a substring. sin(): Sine. cos(): Cosine. sqrt(): Square root. rand(): Random number.
The complete list of awk built-in functions can be viewed manual.
4. Output conditions
awk allows you to specify output conditions and only output qualified rows. The output condition should be written before the action.
$ awk 'Conditional action' file name
Look at the example below.
$ awk -F ':' '/usr/ {print $1}' demo.txt root daemon bin sys
In the above code, the print command is preceded by a regular expression, which only outputs the line containing usr.
Output only odd rows:
# Output odd rows $ awk -F ':' 'NR%2==1 {print $1}' demo.txt root bin sync
Output after the third line:
# Output lines after the third line $ awk -F ':' 'NR>3 {print $1}' demo.txt sys sync
Output the first field of the row whose first field is equal to the specified value root:
$ awk -F ':' '$1 == "root" {print $1}' demo.txt root
Output the first field of the row whose first field is equal to the specified value root or bin:
$ awk -F ':' '$1 == "root" || $1 == "bin" {print $1}' demo.txt root bin
5. if statement
awk provides an if structure for writing complex conditions.
$ awk -F ':' '{if ($1 > "m") print $1}' demo.txt root sys sync
The above code outputs the line whose first character of the first field is greater than m.
The if structure can also specify the else part.
$ awk -F ':' '{if ($1 > "m") print $1; else print "---"}' demo.txt root --- --- sys sync