awk advanced command
1. Introduction to awk
-
AWK is one of the GNU projects, which is based on the improvement of the AWK program language on early unix, so the AWK we use on CentOS is actually called gawk. The authors of AWK: Aho, Kernighan, Weinberger. AWK is named after the initials of these three people. Because people used to use AWK, they simply created a symbolic link to gawk. In fact, AWK does have its own language: AWK programming language, which has been officially defined as "style scanning and processing language" by the three creators.
-
Awk programming language is used to process text and data under linux/unix. The data can come from stdin, one or more files, or the output of other commands. It is used on the command line, but more as a script. Awk has many built-in functions, such as arrays and functions, which is the same as C language. Flexibility is the biggest advantage of awk
-
Awk is a powerful text analysis tool. Compared with grep search and sed editing, awk is particularly powerful when it analyzes data and generates reports. In short, awk is to read the file line by line, slice each line with a space as the default separator, and then analyze and process the cut part
2. Use of awk
Syntax:
awk [-F|-f|-v] 'BEGIN{print "start"} /pattern/{command1; command2} END{print "end"}' filename
- awk script usually consists of BEGIN statement + pattern matching + END statement, all of which are optional
- Where pattern represents the content found by AWK in the data, and commands is a series of commands executed when matching content is found. Curly braces ({}) need not always appear in a program, but they are used to group a series of instructions according to a specific pattern. Pattern is the regular expression to be represented, enclosed by a slash
- The most basic function of awk language is to browse and extract information based on specified rules in files or strings. After awk extracts information, other text operations can be carried out. Complete awk scripts are often used to format information in text files
awk is a behavior processing unit of files. awk receives each line of the file and then executes the corresponding command to process the text
3.awk structure
- The basic operation of awk is to scan and execute each line successively in a sequence composed of input lines, and search for lines that can be matched by patterns. Each input line is tested in turn. Each time a pattern is matched, the corresponding action will be executed, and then the next line will start, and the matching will start again. This process continues until the file is read. In general, patterns are optional, so actions are enclosed with {} to distinguish between the two
- When the program on the command line is surrounded by single quotation marks, this rule can prevent the string in the program (such as $) from being interpreted by the shell, or make the length of the program more than one line
- When the length of the program is relatively short, it is more convenient to write directly, but when it is relatively long, it needs to be put into a file. For example, the file name is pgfile. Just type awk -f pgfile text name
4.awk command form
awk [-F|-f|-v] 'BEGIN{} //{command1; command2} END{}' file
-F specifies the delimiter of the input line. The default delimiter of awk command is space or tab
-f read awk script instructions from script files instead of entering instructions directly on the command line
-v var=val before executing the process, set a variable VaR and give its initial value to val
’’Reference code block
//Matching code blocks, which can be strings or regular expressions
{} command code block containing one or more commands
; Multiple commands are separated by semicolons
BEGIN is executed at the beginning of the awk program without reading any data. Actions after BEGIN are executed only once at the beginning of the program
END is executed when the awk program finishes processing all data and is about to END? The action after END is executed only once at the END of the program
be careful
- BEGIN is mainly the initialization code block. Before processing each line, the initialization code mainly refers to the global variable and sets the FS separator
- END is mainly the END code block. The code block executed after processing each line is mainly used for final calculation or output END summary information
5.akw built in variables
$0 represents the entire current line
$1 ~ $n the nth field of the current record
FS input field separator (- F same function) default space
RS input record separator, default line break (i.e. text is input line by line)
The number of NF fields is the number of columns
NR record number, which is the line number, starts from 1 by default
FNR is similar to NR, but multiple file records are not incremented, and each file starts with 1
OFS output field separator, default space
ORS output record separator, default line break
\t tab
\n newline
~Match, not exact compared with = =
!~ Mismatch, imprecise comparison
==Equal to, must be all equal, accurate comparison
!= Not equal to, exact comparison
&&Logic and
6. Matching mode
When awk reads in a line of yes, it attempts to match every pattern matching rule in the script. Only the input line that matches a specific pattern can become an operation object. If no operation is specified, the input line matching the pattern will be printed
[root@shen ~]# cat tests This line of data is ingored //Blank line //Blank line [root@shen ~]# awk '/^$/{print "This is a blank line."}' test This is a blank line. This is a blank line. [root@node2 ~]# awk '/data/' test This line of data is ingored
7. Records and fields
awk treats each input line as a record, while words (i.e. columns) separated by spaces or tabs are used as fields (the characters used to separate fields are called separators).
[root@shen ~]# echo 'shen long fei'|awk '{print }' / / print the third field fei [root@shen ~]# echo 'shen long fei'|awk 'BEGIN{one=1;tow=2}{print $(one + tow)}' / / define variable print $3 fei
Print the IP address here
[root@shen ~]# ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: ens160: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000 link/ether 00:0c:29:45:ec:b2 brd ff:ff:ff:ff:ff:ff inet 192.168.100.147/24 brd 192.168.100.255 scope global noprefixroute ens160 valid_lft forever preferred_lft forever inet6 fe80::20c:29ff:fe45:ecb2/64 scope link valid_lft forever preferred_lft forever 3: virbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000 link/ether 52:54:00:21:c2:3a brd ff:ff:ff:ff:ff:ff inet 192.168.100.1/24 brd 192.168.100.255 scope global virbr0 valid_lft forever preferred_lft forever 4: virbr0-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc fq_codel master virbr0 state DOWN group default qlen 1000 link/ether 52:54:00:21:c2:3a brd ff:ff:ff:ff:ff:ff [root@shen ~]# ip a|grep 'inet '|grep -v '127.0.0.1'|awk -F '[ /]+' '{print $3}' 192.168.100.147 192.168.100.1
Write matching rules and print specific fields
[root@shen ~]# cat tests This line of shen is ingored This line of long is ingored This line of fei is ingored
[root@shen ~]# awk '/long/{print $1 $6}' tests Thisingored
[root@shen ~]# Awk '/ long / {print $1 ", $6}' test / / specify ',' as the field separator This,ingored [root@shen ~]# Awk '/ long / {print $1 ", $6}' test / / specify ',' as the field separator This, ingored
Matches the fields in the specified column and prints
[root@shen ~]# cat tests This line of shen is ingored This line of long is ingored This line of fei is ingored This line of is long sb
[root@shen ~]# Awk '/ long / {print $1 ", $6}' tests / / the matching example is not specified This,ingored This,sb [root@shen ~]# awk '$4 ~/long/{print $1" "$6}' tests This ingored
Use (! ~) to reverse the meaning of the rule, that is, the field in the fourth column does not contain long
[root@shen ~]# awk '$4 !~/tom/{print $1" "$6}' tests This ingored This ingored This sb
8. Field division
akw has three ways to split fields
- The first method is to separate fields with white space characters. Set fs to a space. In this case, the leading and ending white space characters (spaces and / or tabs) of the record will be ignored. And the fields will be separated by spaces and / or tabs. Because the default value of FS is a space, this is also the usual method awk to divide records into fields.
- The second method is to separate fields with other single characters. For example, awk programs often use ":" as a delimiter. When FS represents any single character, another field will be separated wherever this character appears. If two consecutive separators appear, the field value between them is an empty string.
- The method is that if you set more than one character as the field separator, it will be interpreted as a regular expression.
[root@shen ~]# cat passwd root:x:0:0 root:/root:/bin/bash bin:x:1:1 bin:/bin:/sbin/nologin [root@shen ~]# awk '{print $2}' passwd root:/root:/bin/bash bin:/bin:/sbin/nologin [root@shen ~]# awk 'BEGIN{FS=":"}{print $3}' passwd 0 1
There are many system variables or built-in variables in awk. Awk has two types of system variables. The default values of variables defined by the first type can be changed, such as the default field and record separators. The values of variables defined by the second type can be used in reporting or data processing. For example, the number of fields in the current record, the number of current records, etc. These can be automatically updated by awk, such as the number of the current record and the input file name.
9.OFS output field separator
OFS and FS are equivalent output separators, and their default value is space
[root@shen ~]# cat passwd root:x:0:0 root:/root:/bin/bash bin:x:1:1 bin:/bin:/sbin/nologin [root@shen ~]# awk 'BEGIN{FS=":"}{print $1,$6}' passwd root /bin/bash bin /sbin/nologin [root@shen ~]# awk 'BEGIN{FS=":";OFS="-"}{print $1,$6}' passwd root-/bin/bash bin-/sbin/nologin
10.NF
NF variable is defined as the number of fields in the current input record, that is, how many columns there are.
[root@shen ~]# cat tests shen 82 93 48 94 88 long 81 96 75 99 86 93 fei 82 88 80 93 81 94 89
[root@shen ~]# awk '{print NF}' tests 6 7 8
[root@shen ~]# awk '{print $NF}' tests 88 93 89
11.NR line number
NR is the line number of each line, and multiple file records are incremented.
[root@shen ~]# cat tests shen 82 93 48 94 88 long 81 96 75 99 86 93 fei 82 88 80 93 81 94 89
[root@shen ~]# awk '{print NR $1}' tests 1shen 2long 3fei
[root@shen ~]# awk '{print NR $1}' tests 1.shen 2.long 3.fei
[root@shen ~]# awk '{print NR "." $0}' tests 1.shen 82 93 48 94 88 2.long 81 96 75 99 86 93 3.fei 82 88 80 93 81 94 89
12.RS
To process this kind of record including multiple lines of data, we can define the field separator as a newline character, which is represented by "\ n", and set the record separator as an empty string, which represents an empty line.
[root@shen ~]# cat tests shen 82 93 48 94 88 long 81 96 75 99 86 93 fei 82 88 80 93 81 94 89
[root@shen ~]# awk 'BEGIN{FS="\n";RS=""}{print $1}' run shen 82 93 48 94 88 [root@shen ~]# awk 'BEGIN{FS="\n";RS=""}{print $3}' run fei 82 88 80 93 81 94 89