Linux three swordsman grep|sed|awk

Posted by goldlikesnow on Mon, 22 Nov 2021 01:06:09 +0100

linux three swordsman

1. Three swordsman application scenarios

characteristiccommandscene
grepfilterThe grep command filters the fastest
sedReplace, modify file content, take lineIf replacement / modification is required;
Take out the content in a range (from 11:00 to 12:00)
awkColumn fetching and statistical calculationTake column
Comparison, comparison > = < =! - ><
Statistics, calculation (awk array)

2 three swordsman grep

optionmeaning
-E==egrep supports extended regularization
-ASee, after,-A5 matches what you want and displays the next 5 lines
-BSee, before,-A5 matches what you want and displays the following 5 lines
-CContext, context - C5 matches the content you want and displays up and down 5 lines
-cCount the number of occurrences, equivalent to wc -l
-vReverse, eliminate
-wExact match, nothing on the left and nothing on the right
[root@nn-01 ~]# seq 10 |grep 3 -A5
3
4
5
6
7
8

3 three swordsman sed

3.1 features and format

  • sed stream editor, sed treats the processed content (file) as a stream and processes it continuously until the end of the file

  • sed format

    commandoption(s) sed command function (g) modifierParameters (file)
    sed-r's#oldboy#oldgirl#g'oldboy.txt
  • The core function of sed function: addition, deletion, modification and query

    function
    sReplace substitute
    pDisplay print
    dDelete delete
    caiAdd c/a/i

3.2 execution process of SED command

Reading the file line by line into memory is like doing water flow, judging and executing the corresponding operation.

3.3 core application of SED command

1) sed - find p

Find formatexplain
'2p'Specify a row to find
'1,5p'Specify a line number range to find
'/lidao/p'Similar to grep, filtering, / / can write regular
'/10:00/,/11:00/p'Indicates the filtering of the range

2)sed - delete d

Delete formatexplain
'2d'Delete specified row
'1,5d'Deletes the row of the specified line number range
'/lidao/d'Delete the line containing lidao, / / which can write regular
'/10:00/,/11:00/d'Delete the line in the specified range. If the specified end does not match, it will match until the last line. Agree with '/ 10:00/,$d'.

3)sed - increase cai

commandexplain
creplace replaces this line
aAppend append after the specified content
iinsert inserts before the specified row

4)sed - replace s

  • s substitute
  • G global, sed command. By default, only the first matching content of each line is replaced.
  • If the content to be replaced is empty, it is equivalent to deletion
  • Back reference,'s#()#'#g ', enclose the content you want to reference in parentheses, and use \ 1 and \ 2 to reference
format
's###g'

4 three swordsmen -awk

4.1awk implementation process

awk -F, 'BEGIN{print"begin of file"}{print $2}END{print "end of file"}'
processimplementgive an example
Before reading the fileCommand assignment or command line parameter-F,
BEGINBEGIN{print"begin of file"}
Read from fileThe execution process is similar to the sed command, which is read and executed line by line{print $2}
After reading the fileENDEND{print "end of file"}

4.2 rows and columns

nounName in awkSome notes
that 's okRecord recordBy default, each row is divided by carriage return
columnfieldEach column is separated by spaces by default
Row and column separators in awk can be modified

1) Take line

awka
NR==1Take out the first row
NR>=1 && NR<=5Take out the range from line 1 to line 5
/oldboy/
/101/,/105/
Symbol> < >= <= == !=

2) Take column

  • -F specifies the separator and specifies the end tag of each column (the default is space, continuous space and tab key)

  • number word take Out some one column , notes meaning : stay a w k in Take out a column of numbers. Note: in awk Take out a column of numbers. Note: the content in awk means to take out a column

  • 0 surface show whole that 's ok of within Allow , 0 indicates the content of the whole line, 0 indicates the contents of the whole row and NF indicates the last column

    [hadoop@db48 ~]$ #Take out the ip address in the first network card and the contents of the specified row and column.
    [hadoop@db48 ~]$ ip a s eth0
    2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1454 qdisc pfifo_fast state UP qlen 1000
        link/ether 52:54:00:5b:87:da brd ff:ff:ff:ff:ff:ff
        inet 10.10.110.33/16 brd 10.10.255.255 scope global eth0
           valid_lft forever preferred_lft forever
    [hadoop@db48 ~]$ ip a s eth0 |awk 'NR==3'|awk -F"[ /]+" '{print $3}'
    10.10.110.33
    [hadoop@db48 ~]$ ip a s eth0 |awk -F"[ /]+" 'NR==3{print $3}'
    10.10.110.33
    

4.3 matching mode

  • Who can be the condition of awk
    • Comparison symbol: > < > = < = ==
    • regular
    • Range expression
    • Special conditions BEGIN and END
awk-F"[ /]+"'NR==3{print $3}'
commandoption’Condition {action}‘
’Mode {action}‘
'partten{action}'

1) Comparison expression - refer to the line taking section above

2) Regular

  • //Support extended regular
  • awk can be accurate to a certain column, and a certain row contains / does not contain
  • ~Contain
  • !~ Not included
regularawk regularity
^Means to begin withThe beginning of a column $3~/^oldboy/
$means ending withEnd of a column 4   / l i d a o 4~/lidao 4 /lidao/
#Find the row that starts with 2 in the third column of / etc/password, and display the first, third and last columns
[hadoop@db48 ~]$ tail -10 /etc/passwd
tss:x:59:59:Account used by the trousers package to sandbox the tcsd daemon:/dev/null:/sbin/nologin
zsw:x:1000:1000::/home/zsw:/bin/bash
lxq:x:1001:1001::/home/lxq:/bin/bash
hadoop:x:1002:1002::/home/hadoop:/bin/bash
mysql:x:27:27:MySQL Server:/var/lib/mysql:/bin/false
gj:x:1003:1003::/home/gj:/bin/bash
tmh:x:1004:1004::/home/tmh:/bin/bash
zzl:x:1005:1005::/home/zzl:/bin/bash
phy:x:1006:1006::/home/phy:/bin/bash
wbq:x:1007:1007::/home/wbq:/bin/bash
[hadoop@db48 ~]$ awk -F':' '$3~/^2/{print $1,$3,$NF}' /etc/passwd
daemon 2 /sbin/nologin
nscd 28 /sbin/nologin
mysql 27 /bin/false

3) Representation range

  • /Where to start / where to end / often used
  • NR1 and NR5 start on the first line and end on the fifth line. Similar to sed -n '1,5p'
#Displays the ip address within the specified time range (11:02:00 to 11:02:30)
awk '/11:02:00/,/11:02:30/{print $1}' access.log

4) Special modes BEGIN {} and END {}

patternmeaningApplication scenario
BEGING{}The contents will be executed before awk * * * reads the file***1) Carry out simple statistics and calculation without reading files (commonly used)
2) Add a header before processing the file (understand)
3) Used to define awk variables (rarely used because - v can be used)
END{}The contents will be executed after awk reads the file1) awk performs statistics. The general process is: first perform calculation, and finally output the results in END (common)
2)awk uses arrays to output array results (common)
  • END statistical calculation
  • Statistical method:
statistical methodAbbreviated formApplication scenario
i=i+1i++Count times
sum=sum+???sum+=???Summation
array[]=array[]++array[]++Array classification, statistics, count
Note: I and sum are variables
#Count the number of empty lines in / etc/services
[hadoop@db48 ~]$ awk '/^$/' /etc/services |wc -l
17
[hadoop@db48 ~]$ awk '/^$/{i++}END{print i}' /etc/services
17
#seq 100 summation 1 + 2 + 3 +... + 100 awk implementation
[hadoop@db48 ~]$ seq 100 |awk '{sum=sum+$1}END{print sum}'
5050
#If viewing the process
[hadoop@db48 ~]$ seq 100 |awk '{sum=sum+$1;print sum}END{print sum}'

4.4awk array

  • Statistics log:
    • Count the number of times: count the number of times each ip appears, the number of times each status code appears, the number of times each user in the system is attacked, and the number of times the attacker's ip appears
    • Cumulative summation: count the traffic consumed by each ip
shell arrayawk array
formarray[0]=oldboy
array[1]=lidao
array[0]=oldboy
array[1]=lidao
useecho ${array[0]}print array[0]
Batch output array contentsfor i in ${array[*]}
do
echo $i
done
for(i in array)
print array[i]
awk array special loop. The variable gets the index of the array. You want the contents of the array array[i]
[hadoop@db48 ~]$ awk 'BEGIN{a[0]=oldboy;a[1]=lidao;print a[0],a[1]}'
 
#Letters in awk will be recognized as variables. If you just want to use a string, you need to add double quotation marks ""
[hadoop@db48 ~]$ awk 'BEGIN{a[0]="oldboy";a[1]="lidao";print a[0],a[1]}'
oldboy lidao
[hadoop@db48 ~]$ awk 'BEGIN{a[0]="oldboy";a[1]="lidao";for(i in a) print i,a[i]}'
0 oldboy
1 lidao
#Count the number of domain names and arrange them in reverse order
[hadoop@db48 tmp]$ cat url.txt 
://jingyan.baidu.com/article/6079ad0e7744e869fe86db18.html
https://www.baidu.com/s?ie=UTF-8&wd=typoro%E8%B7%B3%E5%87%BA%E5%88%97%E8%A1%A8
https://www.baidu.com/s?ie=UTF-8&wd=ll%20%E6%96%87%E4%BB%B6%E9%A2%9C%E8%89%B2
https://blog.csdn.net/qq_29242127/article/details/77141485
https://blog.csdn.net/qq_29242127/article/details/77141485
https://blog.csdn.net/qq_29242127/article/details/77141485
#Sort - R (reverse) - n (number) - K2 (second column)
[hadoop@db48 tmp]$ awk -F"[/]+" '{array[$2]++}END{for(i in array) print i,array[i]}' url.txt |sort -rnk2
blog.csdn.net 3
www.baidu.com 2
jingyan.baidu.com 1

4.5awk cycle, judgment

shell programming c language for loopawk for loop
for((i=1;i<=10;i++))
do
echo $i
done
for(i=1;i<=10;i++)
print i
The awk for loop is used to loop through each field
#1+100
[hadoop@db48 tmp]$ awk "BEGIN{for(i=1;i<=100;i++)sum+=i;print sum}"
5050

[hadoop@db48 tmp]$ awk "BEGIN{
for(i=1;i<=100;i++)
    sum+=i;
print sum
}"
5050

[hadoop@db48 tmp]$ awk "BEGIN{
for(i=1;i<=100;i++)
    {sum+=i;
    print sum}
}"
5050
shell programming if conditional sentenceawk if conditional judgment
if["oldhuang" -eq 18];then
echo take to dbj
fi
If (conditional)
print "dbj"
if["oldhuang" -eq 18];then
echo take to dbj
else
echo "rest"
fi
if()
print "dbj"
else
print "rest"
#Find out the disks with disk utilization greater than 70%, and print Filesystem,Size,Used  
[hadoop@db48 ~]$ df -h|awk -F"[ %]+" 'NR>1{if($5>70) print "Disk is not enough\t" $1,$2,$5}'
Disk is not enough	/dev/vda1 20G 77
#Note: awk if there are multiple judgment conditions, the first condition can be placed in 'condition {action}', and the second condition generally uses if

#Interview questions, count the words with less than 6 words in the following sentences and display them
#I am oldboy teacher welcom to oldboy teacher class
[hadoop@db48 ~]$ echo I am oldboy teacher welcom to oldboy teacher class|awk '{
for(i=1;i<=NF;i++) 
    if(length($i)<6)
        print $i
}'
I
am
to
class

4.6awk built in variables

Built in variablemeaning
NRNumber of Record
NFNumber or Field each row has multiple fields (columns) $NF represents the last column
FS-F: == -v FS=: Field Separator field separator, end tag of each field
OFSOutput file separator output field separator (when awk displays each column, what is the division between each column? The default is space)

4.7 summary

  • gawk gnu awk

  • awk option - F -v

  • awk execution process

  • awk row and column fetching

  • awk patterns: comparison, regular, range, special patterns

  • awk array: statistical analysis log

  • awk for loop, if condition judgment

  • Objectives:

    • access.log counts the number of occurrences of each ip and the number of occurrences of each status code
    • secure counts the number of times each user of the system is attacked and the number of times the attacker's ip appears
  • Cumulative summation: count the traffic consumed by each ip

Topics: Linux server