Linux three swordsman grep|sed|awk

Posted by goldlikesnow on Mon, 22 Nov 2021 01:06:09 +0100

linux three swordsman

1. Three swordsman application scenarios

characteristic	command	scene
grep	filter	The grep command filters the fastest
sed	Replace, modify file content, take line	If replacement / modification is required; Take out the content in a range (from 11:00 to 12:00)
awk	Column fetching and statistical calculation	Take column Comparison, comparison > = < =! - >< Statistics, calculation (awk array)

2 three swordsman grep

option	meaning
-E	==egrep supports extended regularization
-A	See, after,-A5 matches what you want and displays the next 5 lines
-B	See, before,-A5 matches what you want and displays the following 5 lines
-C	Context, context - C5 matches the content you want and displays up and down 5 lines
-c	Count the number of occurrences, equivalent to wc -l
-v	Reverse, eliminate
-w	Exact match, nothing on the left and nothing on the right

[root@nn-01 ~]# seq 10 |grep 3 -A5
3
4
5
6
7
8

3 three swordsman sed

3.1 features and format

sed stream editor, sed treats the processed content (file) as a stream and processes it continuously until the end of the file
sed format

command option (s) sed command function (g) modifier Parameters (file)
sed -r 's#oldboy#oldgirl#g' oldboy.txt
The core function of sed function: addition, deletion, modification and query

function
s Replace substitute
p Display print
d Delete delete
cai Add c/a/i

command	option	(s) sed command function (g) modifier	Parameters (file)
sed	-r	's#oldboy#oldgirl#g'	oldboy.txt

function
s	Replace substitute
p	Display print
d	Delete delete
cai	Add c/a/i

3.2 execution process of SED command

Reading the file line by line into memory is like doing water flow, judging and executing the corresponding operation.

3.3 core application of SED command

1) sed - find p

Find format	explain
'2p'	Specify a row to find
'1,5p'	Specify a line number range to find
'/lidao/p'	Similar to grep, filtering, / / can write regular
'/10:00/,/11:00/p'	Indicates the filtering of the range

2)sed - delete d

Delete format	explain
'2d'	Delete specified row
'1,5d'	Deletes the row of the specified line number range
'/lidao/d'	Delete the line containing lidao, / / which can write regular
'/10:00/,/11:00/d'	Delete the line in the specified range. If the specified end does not match, it will match until the last line. Agree with '/ 10:00/,$d'.

3)sed - increase cai

command	explain
c	replace replaces this line
a	Append append after the specified content
i	insert inserts before the specified row

4)sed - replace s

s substitute
G global, sed command. By default, only the first matching content of each line is replaced.
If the content to be replaced is empty, it is equivalent to deletion
Back reference,'s#()#'#g ', enclose the content you want to reference in parentheses, and use \ 1 and \ 2 to reference

format
's###g'

4 three swordsmen -awk

4.1awk implementation process

awk -F, 'BEGIN{print"begin of file"}{print $2}END{print "end of file"}'

process	implement	give an example
Before reading the file	Command assignment or command line parameter	-F,
	BEGIN	BEGIN{print"begin of file"}
Read from file	The execution process is similar to the sed command, which is read and executed line by line	{print $2}
After reading the file	END	END{print "end of file"}

4.2 rows and columns

noun	Name in awk	Some notes
that 's ok	Record record	By default, each row is divided by carriage return
column	field	Each column is separated by spaces by default
Row and column separators in awk can be modified

1) Take line

awk	a
NR==1	Take out the first row
NR>=1 && NR<=5	Take out the range from line 1 to line 5
/oldboy/
/101/,/105/
Symbol	> < >= <= == !=

2) Take column

-F specifies the separator and specifies the end tag of each column (the default is space, continuous space and tab key)
number word take Out some one column ， notes meaning : stay a w k in Take out a column of numbers. Note: in awk Take out a column of numbers. Note: the content in awk means to take out a column

0 surface show whole that 's ok of within Allow ， 0 indicates the content of the whole line, 0 indicates the contents of the whole row and NF indicates the last column

[hadoop@db48 ~]$ #Take out the ip address in the first network card and the contents of the specified row and column.
[hadoop@db48 ~]$ ip a s eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1454 qdisc pfifo_fast state UP qlen 1000
    link/ether 52:54:00:5b:87:da brd ff:ff:ff:ff:ff:ff
    inet 10.10.110.33/16 brd 10.10.255.255 scope global eth0
       valid_lft forever preferred_lft forever
[hadoop@db48 ~]$ ip a s eth0 |awk 'NR==3'|awk -F"[ /]+" '{print $3}'
10.10.110.33
[hadoop@db48 ~]$ ip a s eth0 |awk -F"[ /]+" 'NR==3{print $3}'
10.10.110.33

4.3 matching mode

Who can be the condition of awk
- Comparison symbol: > < > = < = ==
- regular
- Range expression
- Special conditions BEGIN and END

awk	-F"[ /]+"	'NR==3{print $3}'
command	option	’Condition {action}‘
		’Mode {action}‘
		'partten{action}'

1) Comparison expression - refer to the line taking section above

2) Regular

//Support extended regular
awk can be accurate to a certain column, and a certain row contains / does not contain
~Contain
!~ Not included

regular	awk regularity
^Means to begin with	The beginning of a column $3~/^oldboy/
$means ending with	End of a column 4 / l i d a o 4~/lidao 4 /lidao/

#Find the row that starts with 2 in the third column of / etc/password, and display the first, third and last columns
[hadoop@db48 ~]$ tail -10 /etc/passwd
tss:x:59:59:Account used by the trousers package to sandbox the tcsd daemon:/dev/null:/sbin/nologin
zsw:x:1000:1000::/home/zsw:/bin/bash
lxq:x:1001:1001::/home/lxq:/bin/bash
hadoop:x:1002:1002::/home/hadoop:/bin/bash
mysql:x:27:27:MySQL Server:/var/lib/mysql:/bin/false
gj:x:1003:1003::/home/gj:/bin/bash
tmh:x:1004:1004::/home/tmh:/bin/bash
zzl:x:1005:1005::/home/zzl:/bin/bash
phy:x:1006:1006::/home/phy:/bin/bash
wbq:x:1007:1007::/home/wbq:/bin/bash
[hadoop@db48 ~]$ awk -F':' '$3~/^2/{print $1,$3,$NF}' /etc/passwd
daemon 2 /sbin/nologin
nscd 28 /sbin/nologin
mysql 27 /bin/false

3) Representation range

/Where to start / where to end / often used
NR1 and NR5 start on the first line and end on the fifth line. Similar to sed -n '1,5p'

#Displays the ip address within the specified time range (11:02:00 to 11:02:30)
awk '/11:02:00/,/11:02:30/{print $1}' access.log

4) Special modes BEGIN {} and END {}

pattern	meaning	Application scenario
BEGING{}	The contents will be executed before awk * * * reads the file***	1) Carry out simple statistics and calculation without reading files (commonly used) 2) Add a header before processing the file (understand) 3) Used to define awk variables (rarely used because - v can be used)
END{}	The contents will be executed after awk reads the file	1) awk performs statistics. The general process is: first perform calculation, and finally output the results in END (common) 2)awk uses arrays to output array results (common)

END statistical calculation
Statistical method:

statistical method	Abbreviated form	Application scenario
i=i+1	i++	Count times
sum=sum+???	sum+=???	Summation
array[]=array[]++	array[]++	Array classification, statistics, count
Note: I and sum are variables

#Count the number of empty lines in / etc/services
[hadoop@db48 ~]$ awk '/^$/' /etc/services |wc -l
17
[hadoop@db48 ~]$ awk '/^$/{i++}END{print i}' /etc/services
17
#seq 100 summation 1 + 2 + 3 +... + 100 awk implementation
[hadoop@db48 ~]$ seq 100 |awk '{sum=sum+$1}END{print sum}'
5050
#If viewing the process
[hadoop@db48 ~]$ seq 100 |awk '{sum=sum+$1;print sum}END{print sum}'

4.4awk array

Statistics log:
- Count the number of times: count the number of times each ip appears, the number of times each status code appears, the number of times each user in the system is attacked, and the number of times the attacker's ip appears
- Cumulative summation: count the traffic consumed by each ip

	shell array	awk array
form	array[0]=oldboy array[1]=lidao	array[0]=oldboy array[1]=lidao
use	echo ${array[0]}	print array[0]
Batch output array contents	for i in ${array[*]} do echo $i done	for(i in array) print array[i]	awk array special loop. The variable gets the index of the array. You want the contents of the array array[i]

[hadoop@db48 ~]$ awk 'BEGIN{a[0]=oldboy;a[1]=lidao;print a[0],a[1]}'
 
#Letters in awk will be recognized as variables. If you just want to use a string, you need to add double quotation marks ""
[hadoop@db48 ~]$ awk 'BEGIN{a[0]="oldboy";a[1]="lidao";print a[0],a[1]}'
oldboy lidao
[hadoop@db48 ~]$ awk 'BEGIN{a[0]="oldboy";a[1]="lidao";for(i in a) print i,a[i]}'
0 oldboy
1 lidao
#Count the number of domain names and arrange them in reverse order
[hadoop@db48 tmp]$ cat url.txt 
://jingyan.baidu.com/article/6079ad0e7744e869fe86db18.html
https://www.baidu.com/s?ie=UTF-8&wd=typoro%E8%B7%B3%E5%87%BA%E5%88%97%E8%A1%A8
https://www.baidu.com/s?ie=UTF-8&wd=ll%20%E6%96%87%E4%BB%B6%E9%A2%9C%E8%89%B2
https://blog.csdn.net/qq_29242127/article/details/77141485
https://blog.csdn.net/qq_29242127/article/details/77141485
https://blog.csdn.net/qq_29242127/article/details/77141485
#Sort - R (reverse) - n (number) - K2 (second column)
[hadoop@db48 tmp]$ awk -F"[/]+" '{array[$2]++}END{for(i in array) print i,array[i]}' url.txt |sort -rnk2
blog.csdn.net 3
www.baidu.com 2
jingyan.baidu.com 1

4.5awk cycle, judgment

shell programming c language for loop	awk for loop
for((i=1;i<=10;i++)) do echo $i done	for(i=1;i<=10;i++) print i	The awk for loop is used to loop through each field

#1+100
[hadoop@db48 tmp]$ awk "BEGIN{for(i=1;i<=100;i++)sum+=i;print sum}"
5050

[hadoop@db48 tmp]$ awk "BEGIN{
for(i=1;i<=100;i++)
    sum+=i;
print sum
}"
5050

[hadoop@db48 tmp]$ awk "BEGIN{
for(i=1;i<=100;i++)
    {sum+=i;
    print sum}
}"
5050

shell programming if conditional sentence	awk if conditional judgment
if["oldhuang" -eq 18];then echo take to dbj fi	If (conditional) print "dbj"
if["oldhuang" -eq 18];then echo take to dbj else echo "rest" fi	if() print "dbj" else print "rest"

#Find out the disks with disk utilization greater than 70%, and print Filesystem,Size,Used  
[hadoop@db48 ~]$ df -h|awk -F"[ %]+" 'NR>1{if($5>70) print "Disk is not enough\t" $1,$2,$5}'
Disk is not enough	/dev/vda1 20G 77
#Note: awk if there are multiple judgment conditions, the first condition can be placed in 'condition {action}', and the second condition generally uses if

#Interview questions, count the words with less than 6 words in the following sentences and display them
#I am oldboy teacher welcom to oldboy teacher class
[hadoop@db48 ~]$ echo I am oldboy teacher welcom to oldboy teacher class|awk '{
for(i=1;i<=NF;i++) 
    if(length($i)<6)
        print $i
}'
I
am
to
class

4.6awk built in variables

Built in variable	meaning
NR	Number of Record
NF	Number or Field each row has multiple fields (columns) $NF represents the last column
FS	-F: == -v FS=: Field Separator field separator, end tag of each field
OFS	Output file separator output field separator (when awk displays each column, what is the division between each column? The default is space)

4.7 summary

gawk gnu awk
awk option - F -v
awk execution process
awk row and column fetching
awk patterns: comparison, regular, range, special patterns
awk array: statistical analysis log
awk for loop, if condition judgment
Objectives:
- access.log counts the number of occurrences of each ip and the number of occurrences of each status code
- secure counts the number of times each user of the system is attacked and the number of times the attacker's ip appears
Cumulative summation: count the traffic consumed by each ip

Topics: Linux server

Programmer Think