Simple use of awk command in Linux system

Posted by dietkinnie on Thu, 20 Jan 2022 00:46:44 +0100


Reference address: Introduction to awk

awk is an application for processing text files. Almost all Linux systems come with this program. It processes each line of the file in turn and reads each field in it. awk may be the most convenient tool for text files with the same format per line, such as logs and CSV.

1. Basic usage

The basic usage of awk is the following form:

$ awk Action file name

Example:

$ awk '{print $0}' demo.txt

In the above example, demo Txt is the text file to be processed by awk. There is a brace inside the front single quotation mark, which is the processing action print $0 for each line. Where print is the print command and $0 represents the current line. Therefore, the execution result of the above command is to print each line as it is.

Next, let's demonstrate the above example with standard input (stdin).

$ echo 'this is a test' | awk '{print $0}'
this is a test

In the above code, print $0 reprints the standard input this is a test.

By default, awk will divide each row into several fields according to spaces and tabs, and use $1, $2, $3 to represent the first field, the second field, the third field, and so on.

$ echo 'this is a test' | awk '{print $3}'
a

In the above code, $3 represents the third field a of this is a test.

You can also use the - F parameter to specify the separator if there is a file demo Txt is separated by, as shown below:

root:x:0:0:root:/root:/usr/bin/zsh
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
bin:x:2:2:bin:/bin:/usr/sbin/nologin
sys:x:3:3:sys:/dev:/usr/sbin/nologin
sync:x:4:65534:sync:/bin:/bin/sync

We extract its first field:

$ awk -F ':' '{ print $1 }' demo.txt
root
daemon
bin
sys
sync

2. Built in variables

In addition to the $number representing a field, awk provides other variables.

2.1. NF

The variable NF indicates how many fields there are in the current row, so $NF represents the last field.

$ echo 'this is a test' | awk '{print $NF}'
test

$(NF-1) represents the penultimate field.

$ awk -F ':' '{print $1, $(NF-1)}' demo.txt
root /root
daemon /usr/sbin
bin /bin
sys /dev
sync /bin

In the above code, the comma in the print command indicates that when outputting, the two parts are separated by spaces.

2.2. NR

The variable NR indicates which line is currently being processed.

$ awk -F ':' '{print NR ") " $1}' demo.txt
1) root
2) daemon
3) bin
4) sys
5) sync

In the above code, in the print command, if the characters are output as is, they should be placed in double quotation marks.

2.2. more

Other built-in variables of awk are as follows:

FILENAME: Current file name
FS: Field separator, which defaults to spaces and tabs.
RS: Line separator, used to split each line. The default is line feed.
OFS: The separator of the output field, which is used to separate the field when printing. The default is space.
ORS: The separator of the output record, which is used to separate records during printing. The default is line feed.
OFMT: Format of digital output. The default is%.6g. 

3. Built in function

awk also provides some built-in functions to facilitate the processing of raw data.

3.1. toupper()

The function toupper() converts characters to uppercase.

$ awk -F ':' '{ print toupper($1) }' demo.txt
ROOT
DAEMON
BIN
SYS
SYNC

In the above code, the first field is output in uppercase.

3.2. more

Other common functions are as follows:

tolower(): Convert characters to lowercase.
length(): Returns the length of the string.
substr(): Returns a substring.
sin(): Sine.
cos(): Cosine.
sqrt(): Square root.
rand(): Random number.

The complete list of awk built-in functions can be viewed manual.

4. Output conditions

awk allows you to specify output conditions and only output qualified rows. The output condition should be written before the action.

$ awk 'Conditional action' file name

Look at the example below.

$ awk -F ':' '/usr/ {print $1}' demo.txt
root
daemon
bin
sys

In the above code, the print command is preceded by a regular expression, which only outputs the line containing usr.

Output only odd rows:

# Output odd rows
$ awk -F ':' 'NR%2==1 {print $1}' demo.txt
root
bin
sync

Output after the third line:

# Output lines after the third line
$ awk -F ':' 'NR>3 {print $1}' demo.txt
sys
sync

Output the first field of the row whose first field is equal to the specified value root:

$ awk -F ':' '$1 == "root" {print $1}' demo.txt
root

Output the first field of the row whose first field is equal to the specified value root or bin:

$ awk -F ':' '$1 == "root" || $1 == "bin" {print $1}' demo.txt
root
bin

5. if statement

awk provides an if structure for writing complex conditions.

$ awk -F ':' '{if ($1 > "m") print $1}' demo.txt
root
sys
sync

The above code outputs the line whose first character of the first field is greater than m.

The if structure can also specify the else part.

$ awk -F ':' '{if ($1 > "m") print $1; else print "---"}' demo.txt
root
---
---
sys
sync

Topics: Linux shell awk