awk is also a great data processing tool.
awk is used to intercept qualified columns.
Awk is much more powerful than cut; It can even be called awk programming.
awk command
Format:
awk 'Condition 1{Action 1} Condition 2{Action 2} ...' filename
meaning:
- awk is followed by single quotation marks. There will be multiple conditions and action conditions {action} in the quotation marks. Just like if else in java, if condition 1 is met, action 1 is executed.
- awk can process subsequent files or read standard output from the previous command.
- awk is mainly used to process the "data in the field of each row", and the default "field separator" is "blank key" or "[tab] key"!
[userwin@MiWiFi-R3L-srv ~]$ df -h file system Capacity used available used% Mount point /dev/mapper/centos-root 18G 2.0G 16G 12% / devtmpfs 479M 0 479M 0% /dev tmpfs 489M 0 489M 0% /dev/shm tmpfs 489M 6.7M 483M 2% /run tmpfs 489M 0 489M 0% /sys/fs/cgroup /dev/sda1 497M 107M 391M 22% /boot tmpfs 98M 0 98M 0% /run/user/1000 # Want to use the cut command to get the second column of the above content; Was the result unexpected? [userwin@MiWiFi-R3L-srv ~]$ df -h | cut -d " " -f2 # The file system column is followed by multiple spaces, which is not recognized by the cut command. The awk command just solves this problem. [userwin@MiWiFi-R3L-srv ~]$ df -h | awk '{print $2}' capacity 18G 479M 489M 489M 489M 497M 98M
The file system column is followed by multiple spaces, which is not recognized by the cut command; As shown below:
Get the second and sixth columns df -h | awk '{print $2 "\ t" $6}'
The quotation mark is directly followed by {print. The condition is omitted here. The default is true.
awk most commonly used action! List the field data through the function of print! The fields are separated by blank key or [tab] key
[userwin@MiWiFi-R3L-srv ~]$ df -h | awk '{print $2 "\t" $6}' capacity Mount point 18G / 479M /dev 489M /dev/shm 489M /run 489M /sys/fs/cgroup 497M /boot 98M /run/user/1000
The difference between print and printf
In the awk statement, the printf output will not wrap lines and needs to be added manually. \ n
Let's look at the following example:
[userwin@MiWiFi-R3L-srv ~]$ df -h | awk '{print $2 "\t" $6}' capacity Mount point 18G / 479M /dev 489M /dev/shm 489M /run 489M /sys/fs/cgroup 497M /boot 98M /run/user/1000 [userwin@MiWiFi-R3L-srv ~]$ df -h | awk '{printf $2 "\t" $6}' capacity Mount point 18 G /479M /dev489M /dev/shm489M /run489M /sys/fs/cgroup497M /boot [userwin@MiWiFi-R3L-srv ~]$ df -h | awk '{printf $2 "\t" $6 "\n"}' capacity Mount point 18G / 479M /dev 489M /dev/shm 489M /run 489M /sys/fs/cgroup 497M /boot 98M /run/user/1000
Get the number before% in column 5 of sda1
# see [userwin@MiWiFi-R3L-srv ~]$ df -h file system Capacity used available used% Mount point /dev/mapper/centos-root 18G 2.0G 16G 12% / devtmpfs 479M 0 479M 0% /dev tmpfs 489M 0 489M 0% /dev/shm tmpfs 489M 6.7M 483M 2% /run tmpfs 489M 0 489M 0% /sys/fs/cgroup /dev/sda1 497M 107M 391M 22% /boot tmpfs 98M 0 98M 0% /run/user/1000 # Get sda1 row [userwin@MiWiFi-R3L-srv ~]$ df -h | grep "sda1" /dev/sda1 497M 107M 391M 22% /boot # Get column 5 [userwin@MiWiFi-R3L-srv ~]$ df -h | grep "sda1" | awk '{print $5}' 22% # Use cut to intercept the value before% [userwin@MiWiFi-R3L-srv ~]$ df -h | grep "sda1" | awk '{print $5}' |cut -d '%' -f1 22
Built in variable NF NR FS for awk
awk command: each field in each line has variable names, that is, $1, $2... And other variable names. Each field corresponds to each column$ 1 refers to the first column;
$0 means "a whole column of data"
Variable name | Representative meaning |
---|---|
NF | Total number of fields per row ($0) |
NR | At present, awk processes the "row" data |
FS | The current delimited byte is blank by default |
# View the contents of / bin/bash in the passwd file [userwin@MiWiFi-R3L-srv ~]$ cat /etc/passwd | grep "/bin/bash" root:x:0:0:root:/root:/bin/bash userwin:x:1000:1000:userwin:/home/userwin:/bin/bash # Use awk custom separator to get 1 and 3 columns [userwin@MiWiFi-R3L-srv ~]$ cat /etc/passwd | grep "/bin/bash" | awk '{FS=":"} {print $1 "\t" $3}' root:x:0:0:root:/root:/bin/bash userwin 1000
Why is the first column not split?
Take a look at the execution sequence of awk. First read in the first line, and then execute the separator FS =: after reading. Therefore, the following figure appears
You need to add an empty line using the BEGIN command, as follows
[userwin@MiWiFi-R3L-srv ~]$ cat /etc/passwd | grep "/bin/bash" |\ awk 'BEGIN{FS=":"} {print $1 "\t" $3}' root 0 userwin 1000
Execution sequence of awk
- First read in the first row and fill the data in the first row into variables such as $0, $1, $2;
- Judge whether the following "action" is required according to the restriction of "condition type";
- Complete all actions and condition types;
- If there are subsequent "row" data, repeat steps 1 ~ 3 above until all data are read.
BEGIN in awk
BEGIN add content before processing data
[userwin@MiWiFi-R3L-srv ~]$ df -h | grep "sda1" | awk 'BEGIN{print "sda1 The utilization rate of is:"}{print $5}' sda1 The utilization rate of is: 22% [userwin@MiWiFi-R3L-srv ~]$ df -h | grep "sda1" | awk 'BEGIN{printf "sda1 The utilization rate of is:"}{print $5}' sda1 Utilization rate of: 22%
END in awk
Add content after BND processes data
[userwin@MiWiFi-R3L-srv ~]$ df -h | grep "sda1" | awk 'BEGIN{print "sda1 The utilization rate of is:"} END{print "Command execution completed!!"}{print $5}' sda1 The utilization rate of is: 22% Command execution completed!!
Logical operation of awk
Arithmetic unit | Representative meaning |
---|---|
> | Greater than |
< | Less than |
>= | Greater than or equal to |
<= | Less than or equal to |
== | Equal to |
!= | Not equal to |
[userwin@MiWiFi-R3L-srv ~]$ df -h | grep "sda1" | awk '{print $5}' | awk 'BEGIN{FN="%"} $1>=20{print "Disk utilization over 20%"}' Disk utilization over 20% [userwin@MiWiFi-R3L-srv ~]$ df -h | grep "sda1" | awk '{print $5}' | awk 'BEGIN{FN="%"} $1>=80{print "Disk utilization over 80%"} $1<80{print "Disk utilization does not exceed 80%"}' Disk utilization does not exceed 80%