awk advanced command

Posted by Kor Ikron on Tue, 21 Sep 2021 20:19:45 +0200

awk advanced command

1. Introduction to awk

  • AWK is one of the GNU projects, which is based on the improvement of the AWK program language on early unix, so the AWK we use on CentOS is actually called gawk. The authors of AWK: Aho, Kernighan, Weinberger. AWK is named after the initials of these three people. Because people used to use AWK, they simply created a symbolic link to gawk. In fact, AWK does have its own language: AWK programming language, which has been officially defined as "style scanning and processing language" by the three creators.

  • Awk programming language is used to process text and data under linux/unix. The data can come from stdin, one or more files, or the output of other commands. It is used on the command line, but more as a script. Awk has many built-in functions, such as arrays and functions, which is the same as C language. Flexibility is the biggest advantage of awk

  • Awk is a powerful text analysis tool. Compared with grep search and sed editing, awk is particularly powerful when it analyzes data and generates reports. In short, awk is to read the file line by line, slice each line with a space as the default separator, and then analyze and process the cut part

2. Use of awk

Syntax:

awk [-F|-f|-v] 'BEGIN{print "start"} /pattern/{command1; command2} END{print "end"}' filename
  • awk script usually consists of BEGIN statement + pattern matching + END statement, all of which are optional
  • Where pattern represents the content found by AWK in the data, and commands is a series of commands executed when matching content is found. Curly braces ({}) need not always appear in a program, but they are used to group a series of instructions according to a specific pattern. Pattern is the regular expression to be represented, enclosed by a slash
  • The most basic function of awk language is to browse and extract information based on specified rules in files or strings. After awk extracts information, other text operations can be carried out. Complete awk scripts are often used to format information in text files
    awk is a behavior processing unit of files. awk receives each line of the file and then executes the corresponding command to process the text

3.awk structure

  • The basic operation of awk is to scan and execute each line successively in a sequence composed of input lines, and search for lines that can be matched by patterns. Each input line is tested in turn. Each time a pattern is matched, the corresponding action will be executed, and then the next line will start, and the matching will start again. This process continues until the file is read. In general, patterns are optional, so actions are enclosed with {} to distinguish between the two
  • When the program on the command line is surrounded by single quotation marks, this rule can prevent the string in the program (such as $) from being interpreted by the shell, or make the length of the program more than one line
  • When the length of the program is relatively short, it is more convenient to write directly, but when it is relatively long, it needs to be put into a file. For example, the file name is pgfile. Just type awk -f pgfile text name

4.awk command form

awk [-F|-f|-v] 'BEGIN{} //{command1; command2} END{}' file

-F specifies the delimiter of the input line. The default delimiter of awk command is space or tab
-f read awk script instructions from script files instead of entering instructions directly on the command line
-v var=val before executing the process, set a variable VaR and give its initial value to val
’’Reference code block
//Matching code blocks, which can be strings or regular expressions
{} command code block containing one or more commands
; Multiple commands are separated by semicolons
BEGIN is executed at the beginning of the awk program without reading any data. Actions after BEGIN are executed only once at the beginning of the program
END is executed when the awk program finishes processing all data and is about to END? The action after END is executed only once at the END of the program

be careful

  • BEGIN is mainly the initialization code block. Before processing each line, the initialization code mainly refers to the global variable and sets the FS separator
  • END is mainly the END code block. The code block executed after processing each line is mainly used for final calculation or output END summary information

5.akw built in variables

$0 represents the entire current line
$1 ~ $n the nth field of the current record
FS input field separator (- F same function) default space
RS input record separator, default line break (i.e. text is input line by line)
The number of NF fields is the number of columns
NR record number, which is the line number, starts from 1 by default
FNR is similar to NR, but multiple file records are not incremented, and each file starts with 1
OFS output field separator, default space
ORS output record separator, default line break
\t tab
\n newline
~Match, not exact compared with = =
!~ Mismatch, imprecise comparison
==Equal to, must be all equal, accurate comparison
!= Not equal to, exact comparison
&&Logic and

6. Matching mode

When awk reads in a line of yes, it attempts to match every pattern matching rule in the script. Only the input line that matches a specific pattern can become an operation object. If no operation is specified, the input line matching the pattern will be printed

[root@shen ~]# cat tests 
This line of data is ingored
                                              //Blank line
                                              //Blank line
[root@shen ~]# awk '/^$/{print "This is a blank line."}' test 
This is a blank line.
This is a blank line.
[root@node2 ~]# awk '/data/' test 
This line of data is ingored

7. Records and fields

awk treats each input line as a record, while words (i.e. columns) separated by spaces or tabs are used as fields (the characters used to separate fields are called separators).

[root@shen ~]# echo 'shen long fei'|awk '{print }' / / print the third field
fei

[root@shen ~]# echo 'shen long fei'|awk 'BEGIN{one=1;tow=2}{print $(one + tow)}' / / define variable print $3
fei

Print the IP address here

[root@shen ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: ens160: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 00:0c:29:45:ec:b2 brd ff:ff:ff:ff:ff:ff
    inet 192.168.100.147/24 brd 192.168.100.255 scope global noprefixroute ens160
       valid_lft forever preferred_lft forever
    inet6 fe80::20c:29ff:fe45:ecb2/64 scope link 
       valid_lft forever preferred_lft forever
3: virbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
    link/ether 52:54:00:21:c2:3a brd ff:ff:ff:ff:ff:ff
    inet 192.168.100.1/24 brd 192.168.100.255 scope global virbr0
       valid_lft forever preferred_lft forever
4: virbr0-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc fq_codel master virbr0 state DOWN group default qlen 1000
    link/ether 52:54:00:21:c2:3a brd ff:ff:ff:ff:ff:ff

[root@shen ~]# ip a|grep 'inet '|grep -v '127.0.0.1'|awk -F '[ /]+' '{print $3}'
192.168.100.147
192.168.100.1

Write matching rules and print specific fields

[root@shen ~]# cat tests
This line of shen is ingored
This line of long is ingored
This line of fei is ingored
[root@shen ~]# awk '/long/{print $1 $6}' tests 
Thisingored
[root@shen ~]# Awk '/ long / {print $1 ", $6}' test / / specify ',' as the field separator
This,ingored
[root@shen ~]# Awk '/ long / {print $1 ", $6}' test / / specify ',' as the field separator
This, ingored

Matches the fields in the specified column and prints

[root@shen ~]# cat tests 
This line of shen is ingored
This line of long is ingored
This line of fei is ingored
This line of is long sb
[root@shen ~]# Awk '/ long / {print $1 ", $6}' tests / / the matching example is not specified
This,ingored
This,sb

[root@shen ~]# awk '$4 ~/long/{print $1" "$6}' tests 
This ingored

Use (! ~) to reverse the meaning of the rule, that is, the field in the fourth column does not contain long

[root@shen ~]# awk '$4 !~/tom/{print $1" "$6}' tests 
This ingored
This ingored
This sb

8. Field division

akw has three ways to split fields

  • The first method is to separate fields with white space characters. Set fs to a space. In this case, the leading and ending white space characters (spaces and / or tabs) of the record will be ignored. And the fields will be separated by spaces and / or tabs. Because the default value of FS is a space, this is also the usual method awk to divide records into fields.
  • The second method is to separate fields with other single characters. For example, awk programs often use ":" as a delimiter. When FS represents any single character, another field will be separated wherever this character appears. If two consecutive separators appear, the field value between them is an empty string.
  • The method is that if you set more than one character as the field separator, it will be interpreted as a regular expression.
[root@shen ~]# cat passwd 
root:x:0:0 root:/root:/bin/bash
bin:x:1:1 bin:/bin:/sbin/nologin
[root@shen ~]# awk '{print $2}' passwd 
root:/root:/bin/bash
bin:/bin:/sbin/nologin
[root@shen ~]# awk 'BEGIN{FS=":"}{print $3}' passwd 
0
1

There are many system variables or built-in variables in awk. Awk has two types of system variables. The default values of variables defined by the first type can be changed, such as the default field and record separators. The values of variables defined by the second type can be used in reporting or data processing. For example, the number of fields in the current record, the number of current records, etc. These can be automatically updated by awk, such as the number of the current record and the input file name.

9.OFS output field separator

OFS and FS are equivalent output separators, and their default value is space

[root@shen ~]# cat passwd 
root:x:0:0 root:/root:/bin/bash
bin:x:1:1 bin:/bin:/sbin/nologin

[root@shen ~]# awk 'BEGIN{FS=":"}{print $1,$6}' passwd 
root /bin/bash
bin /sbin/nologin

[root@shen ~]# awk 'BEGIN{FS=":";OFS="-"}{print $1,$6}' passwd 
root-/bin/bash
bin-/sbin/nologin

10.NF

NF variable is defined as the number of fields in the current input record, that is, how many columns there are.

[root@shen ~]# cat tests 
shen 82 93 48 94 88
long 81 96 75 99 86 93
fei 82 88 80 93 81 94 89
[root@shen ~]# awk '{print NF}' tests 
6
7
8
[root@shen ~]# awk '{print $NF}' tests 
88
93
89

11.NR line number

NR is the line number of each line, and multiple file records are incremented.

[root@shen ~]# cat tests 
shen 82 93 48 94 88
long 81 96 75 99 86 93
fei 82 88 80 93 81 94 89
[root@shen ~]# awk '{print NR $1}' tests 
1shen
2long
3fei
[root@shen ~]# awk '{print NR $1}' tests 
1.shen
2.long
3.fei
[root@shen ~]# awk '{print NR "." $0}' tests  
1.shen 82 93 48 94 88
2.long 81 96 75 99 86 93
3.fei 82 88 80 93 81 94 89

12.RS

To process this kind of record including multiple lines of data, we can define the field separator as a newline character, which is represented by "\ n", and set the record separator as an empty string, which represents an empty line.

[root@shen ~]# cat tests 
shen 82 93 48 94 88
long 81 96 75 99 86 93
fei 82 88 80 93 81 94 89
[root@shen ~]# awk 'BEGIN{FS="\n";RS=""}{print $1}' run 
shen 82 93 48 94 88

[root@shen ~]# awk 'BEGIN{FS="\n";RS=""}{print $3}' run
fei 82 88 80 93 81 94 89

Topics: awk