Introduction to awk

Posted by paulg on Wed, 26 Jan 2022 04:17:34 +0100

This article is reproduced from Ruan Yifeng's Weblog awk tutorial

awk is an application for processing text files. Almost all Linux systems come with this program.

It processes each line of the file in turn and reads each field in it. awk may be the most convenient tool for text files with the same format per line, such as logs and CSV.
awk is not only a tool software, but also a programming language. However, this article only introduces its command-line usage, which should be enough for most occasions.

1, Basic usage

The basic usage of awk is the following form.

#format
$ awk Action file name

#Examples
$ awk '{print $0}' demo.txt

In the above example, demo Txt is the text file to be processed by awk. There is a brace inside the front single quotation mark, which is the processing action print $0 of each line. Where print is the print command and $0 represents the current line. Therefore, the execution result of the above command is to print each line as it is.

Next, let's demonstrate the above example with standard input (stdin).

$ echo 'this is a test' | awk '{print $0}'

this is a test

In the above code, print $0 is to input the standard into this is a test and print it again.

awk will divide each row into several fields according to spaces and tabs, and use $1, $2, $3 to represent the first field, the second field, the third field, and so on.

$ echo 'this is a test' | awk '{print $3}'

a

In the above code, $3 represents the third field a of this is a test.

Next, for example, we save the / etc/passwd file as demo txt.

root:x:0:0:root:/root:/usr/bin/zsh
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
bin:x:2:2:bin:/bin:/usr/sbin/nologin
sys:x:3:3:sys:/dev:/usr/sbin/nologin
sync:x:4:65534:sync:/bin:/bin/sync

The field separator of this file is a colon (:), so use the - F parameter to specify the separator as a colon. Then, its first field can be extracted.

$ awk -F ':' '{ print $1 }' demo.txt

root
daemon
bin
sys
sync

2, Variable

In addition to the $+ number representing a field, awk also provides some other variables.

The variable NF indicates how many fields there are in the current row, so $NF represents the last field.

$ echo 'this is a test' | awk '{print $NF}'

test

$(NF-1) represents the penultimate field.

$ awk -F ':' '{print $1, $(NF-1)}' demo.txt

root /root
daemon /usr/sbin
bin /bin
sys /dev
sync /bin

In the above code, the comma in the print command indicates that when outputting, the two parts are separated by spaces.

The variable NR indicates which line is currently being processed.

$ awk -F ':' '{print NR ") " $1}' demo.txt

1) root
2) daemon
3) bin
4) sys
5) sync

In the above code, in the print command, if the characters are output as is, they should be placed in double quotation marks.

Other built-in variables of awk are as follows.

FILENAME: Current file name
FS: Field separator, which defaults to spaces and tabs.
RS: Line separator, used to split each line. The default is line feed.
OFS: The separator of the output field, which is used to separate the field when printing. The default is space.
ORS: The separator of output records, which is used to separate records during printing. The default is line feed.
OFMT: Format of digital output. The default is%.6g. 

3, Function

awk also provides some built-in functions to facilitate the processing of raw data.

The function toupper() is used to convert characters to uppercase.

$ awk -F ':' '{ print toupper($1) }' demo.txt

ROOT
DAEMON
BIN
SYS
SYNC

In the above code, the first field is output in uppercase.

Other common functions are as follows.

tolower(): Convert characters to lowercase.
length(): Returns the length of the string.
substr(): Returns a substring.
sin(): Sine.
cos(): Cosine.
sqrt(): Square root.
rand(): Random number.

The complete list of awk built-in functions can be viewed manual.

4, Conditions

awk allows you to specify output conditions and only output qualified rows.

The output condition should be written in front of the action.

$ awk 'Conditional action' file name

Look at the example below.

$ awk -F ':' '/usr/ {print $1}' demo.txt

root
daemon
bin
sys

In the above code, the print command is preceded by a regular expression, which only outputs the line containing usr.

The following example outputs only odd rows and rows after the third row.

# Output odd rows
$ awk -F ':' 'NR % 2 == 1 {print $1}' demo.txt

root
bin
sync
# Output lines after the third line
$ awk -F ':' 'NR >3 {print $1}' demo.txt
sys
sync

The following example outputs a row with the first field equal to the specified value.

$ awk -F ':' '$1 == "root" {print $1}' demo.txt

root
$ awk -F ':' '$1 == "root" || $1 == "bin" {print $1}' demo.txt

root
bin

5, if statement

awk provides an if structure for writing complex conditions.

$ awk -F ':' '{if ($1 > "m") print $1}' demo.txt

root
sys
sync

The above code outputs the line whose first character of the first field is greater than m.

The if structure can also specify the else part.

$ awk -F ':' '{if ($1 > "m") print $1; else print "---"}' demo.txt

root
---
---
sys
sync

reference

  • https://www.ruanyifeng.com/blog/2018/11/awk.html

Topics: Linux bash awk