This article is reproduced from Ruan Yifeng's Weblog awk tutorial
awk is an application for processing text files. Almost all Linux systems come with this program.
It processes each line of the file in turn and reads each field in it. awk may be the most convenient tool for text files with the same format per line, such as logs and CSV.
awk is not only a tool software, but also a programming language. However, this article only introduces its command-line usage, which should be enough for most occasions.
1, Basic usage
The basic usage of awk is the following form.
#format $ awk Action file name #Examples $ awk '{print $0}' demo.txt
In the above example, demo Txt is the text file to be processed by awk. There is a brace inside the front single quotation mark, which is the processing action print $0 of each line. Where print is the print command and $0 represents the current line. Therefore, the execution result of the above command is to print each line as it is.
Next, let's demonstrate the above example with standard input (stdin).
$ echo 'this is a test' | awk '{print $0}' this is a test
In the above code, print $0 is to input the standard into this is a test and print it again.
awk will divide each row into several fields according to spaces and tabs, and use $1, $2, $3 to represent the first field, the second field, the third field, and so on.
$ echo 'this is a test' | awk '{print $3}' a
In the above code, $3 represents the third field a of this is a test.
Next, for example, we save the / etc/passwd file as demo txt.
root:x:0:0:root:/root:/usr/bin/zsh daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin bin:x:2:2:bin:/bin:/usr/sbin/nologin sys:x:3:3:sys:/dev:/usr/sbin/nologin sync:x:4:65534:sync:/bin:/bin/sync
The field separator of this file is a colon (:), so use the - F parameter to specify the separator as a colon. Then, its first field can be extracted.
$ awk -F ':' '{ print $1 }' demo.txt root daemon bin sys sync
2, Variable
In addition to the $+ number representing a field, awk also provides some other variables.
The variable NF indicates how many fields there are in the current row, so $NF represents the last field.
$ echo 'this is a test' | awk '{print $NF}' test
$(NF-1) represents the penultimate field.
$ awk -F ':' '{print $1, $(NF-1)}' demo.txt root /root daemon /usr/sbin bin /bin sys /dev sync /bin
In the above code, the comma in the print command indicates that when outputting, the two parts are separated by spaces.
The variable NR indicates which line is currently being processed.
$ awk -F ':' '{print NR ") " $1}' demo.txt 1) root 2) daemon 3) bin 4) sys 5) sync
In the above code, in the print command, if the characters are output as is, they should be placed in double quotation marks.
Other built-in variables of awk are as follows.
FILENAME: Current file name FS: Field separator, which defaults to spaces and tabs. RS: Line separator, used to split each line. The default is line feed. OFS: The separator of the output field, which is used to separate the field when printing. The default is space. ORS: The separator of output records, which is used to separate records during printing. The default is line feed. OFMT: Format of digital output. The default is%.6g.
3, Function
awk also provides some built-in functions to facilitate the processing of raw data.
The function toupper() is used to convert characters to uppercase.
$ awk -F ':' '{ print toupper($1) }' demo.txt ROOT DAEMON BIN SYS SYNC
In the above code, the first field is output in uppercase.
Other common functions are as follows.
tolower(): Convert characters to lowercase. length(): Returns the length of the string. substr(): Returns a substring. sin(): Sine. cos(): Cosine. sqrt(): Square root. rand(): Random number.
The complete list of awk built-in functions can be viewed manual.
4, Conditions
awk allows you to specify output conditions and only output qualified rows.
The output condition should be written in front of the action.
$ awk 'Conditional action' file name
Look at the example below.
$ awk -F ':' '/usr/ {print $1}' demo.txt root daemon bin sys
In the above code, the print command is preceded by a regular expression, which only outputs the line containing usr.
The following example outputs only odd rows and rows after the third row.
# Output odd rows $ awk -F ':' 'NR % 2 == 1 {print $1}' demo.txt root bin sync
# Output lines after the third line $ awk -F ':' 'NR >3 {print $1}' demo.txt sys sync
The following example outputs a row with the first field equal to the specified value.
$ awk -F ':' '$1 == "root" {print $1}' demo.txt root
$ awk -F ':' '$1 == "root" || $1 == "bin" {print $1}' demo.txt root bin
5, if statement
awk provides an if structure for writing complex conditions.
$ awk -F ':' '{if ($1 > "m") print $1}' demo.txt root sys sync
The above code outputs the line whose first character of the first field is greater than m.
The if structure can also specify the else part.
$ awk -F ':' '{if ($1 > "m") print $1; else print "---"}' demo.txt root --- --- sys sync
reference
- https://www.ruanyifeng.com/blog/2018/11/awk.html