1, Regular expression
1. Regular expression overview
It is usually used to check whether a string satisfies a certain format in a judgment statement
Regular expressions are composed of ordinary characters and metacharacters
Ordinary characters include upper and lower case letters, numbers, punctuation marks and some other symbols
Metacharacter is a special character with special meaning in regular expression, which can be used to specify the occurrence mode of its leading character (that is, the character in front of metacharacter) in the target object
There are two regular expression engines commonly used in Linux
① Basic regular expression: BRE
② Extended regular expression: ERE
Text processing tools | Basic regular expression | Extended regular expression |
---|---|---|
vi editor | support | \ |
grep | support | \ |
egrep | support | support |
sed | support | \ |
awk | support | support |
2. Basic regular expression
- The basic regular expression is a common regular expression part. The common metacharacters and functions are shown in the following table:
Metacharacter | effect |
---|---|
\ | Escape character, used to cancel the meaning of special symbols, such as: \!, \n |
^ | The starting position of the matching string, such as: ^ world matches the line starting with world |
$ | Match the end of the string, for example: world $matches the line ending in world |
. | Match any 1 character except \ n (line feed) |
* | Match the front sub expression 0 or more times |
[list] | Match a character in the list, such as: [0-9] match any digit |
[^list] | Match any 1 character not in the list, for example: [^ 0-9] match any non numeric character |
\ {n \ } | Match the previous subexpression n times, for example: [0-9] {2 \} matches two digits |
\ {n,\ } | Match the previous sub expression no less than n times, for example: [0-9] \ {2, \} represents two or more digits |
\ {n,m\ } | Match the previous subexpression n to m times, such as: [a-z]\ {2,3 \} matching two to three lowercase letters |
- Note that when egrep and awk use {n}, {n,}, {n,m} to match, "{}" does not need to be preceded by "\"
- Examples
[root@localhost /home]#grep -n "^the" test.txt #Filter lines starting with the 4:the tongue is boneless but it breaks bones.12! [root@localhost /home]#grep -n "words$" test.txt #Filter lines ending with words 9:Actions speak louder than words [root@localhost /home]#grep 'sh[io]rt' test.txt #Filter rows containing the string short or short he was short and fat. He was wearing a blue polo shirt with black pants. [root@localhost /home]#grep '[0-9]' test.txt #Filter out rows with numbers 0 to 9 the tongue is boneless but it breaks bones.12! PI=3.141592653589793238462643383249901429 [root@localhost /home]#grep '[a-c]' test.txt #Filter out rows containing the letters a through c, excluding uppercase he was short and fat. He was wearing a blue polo shirt with black pants. [root@localhost /home]#grep -i '[a-c]' test.txt #The grep -i option ignores case he was short and fat. He was wearing a blue polo shirt with black pants. The home of Football on BBC Sport online. Actions speak louder than words #AxyzxyzxyzxyzC [root@localhost /home]#grep -n '[^a-z]oo' test.txt #Filter oo lines that are not preceded by lowercase letters 3:The home of Football on BBC Sport online. [root@localhost /home]#grep -n '^[a-z]oo' test.txt #Filter oo lines preceded by lowercase letters 5:google is the best tools for search keyword. [root@localhost /home]#grep -n 'o\{2\}' test.txt #Find lines where the letter o appears twice or more 3:The home of Football on BBC Sport online. 5:google is the best tools for search keyword. 8:a wood cross! 12:#woood # 13:#woooooood 15:I bet this place is really spooky late at night! [root@localhost /home]#grep -n 'wo\{2,5\}d' test.txt #Find lines with 2 ~ 5 o's starting with w and ending with d 8:a wood cross! 12:#woood # [root@localhost /home]#grep -n 'wo\{2,\}d' test.txt #Find a line with two or more o's starting with w and ending with d 8:a wood cross! 12:#woood # 13:#woooooood [root@localhost /home]#grep o* test.txt #Match everything, including blank lines he was short and fat. He was wearing a blue polo shirt with black pants. The home of Football on BBC Sport online. the tongue is boneless but it breaks bones.12! google is the best tools for search keyword. The year ahead will test our political establishment to the limit. [root@localhost /home]#grep -n 'oo*' test.txt #Match rows with at least one o #The first o must exist, and the second o can appear 0 or more times 1:he was short and fat. 2:He was wearing a blue polo shirt with black pants. 3:The home of Football on BBC Sport online. 4:the tongue is boneless but it breaks bones.12! 5:google is the best tools for search keyword. 6:The year ahead will test our political establishment to the limit.
3. Extended regular expression
- Extended regular expression is the extension and deepening of basic regular expression
- There are tools and support for awegek
- Extended regular expression metacharacter
Metacharacter function
+Match the front sub expression more than 1 time, for example: go+d, at least one o will be matched
? Match the front sub expression 0 or 1 times, such as: go?d. Will match gd or god
() take the string in parentheses as a whole, such as g(oo) +d, which will match. More than one time as a whole, such as good, good, etc
I matches the string of the note in the way of or, for example: good I great, which will match good or great
Examples
[root@localhost /home]#egrep -n 't(a|e)st' test.txt #Match the row containing tast or test 6:The year ahead will test our political establishment to the limit. 17:tast [root@localhost /home]#egrep 'A(xyz)+C' test.txt #Match the line beginning with A and ending with xyz string in the middle #AxyzxyzxyzxyzC [root@localhost /home]#egrep 'wo+d' test.txt #Match rows with at least one o a wood cross! #woood # #woooooood
2, Sed tool
1.Sed overview
sed is a text processor, which relies on regular expressions. It can read text content and add, delete and replace data according to specified conditions. It is widely used in shell scripts to complete automatic processing tasks.
When processing data, sed does not directly modify the source file by default, but stores the currently processed line in the temporary buffer. All instructions operate in the buffer. After processing, the content of the buffer is output to the screen by default, and then the content of the next line is processed. In this way, it is repeated continuously until the end of the file, The contents of the file itself have not changed (unless the output is stored by redirection)
2. Sed basic syntax
- sed command format
sed [option] 'operation' parameter sed [option] -f Script parameters
Common sed command options
option | explain |
---|---|
-e or – expression= | Represents a text file that processes input with a specified command or script |
-f or – file | Indicates that the input text file is processed with the specified script file |
-h or – help | Show help |
-i | Edit text files directly |
-n. – quiet or silent | Indicates that only the processed results are displayed |
"Operation" is used to specify the action behavior of file operation, that is, the command of sed. Generally, it is in the format of "[n1[,n2]]" operation parameter. n1 and n2 are optional and represent the number of lines selected for operation. If the operation needs to be carried out between 5 and 20 lines, it is expressed as "5,20 action behavior". Common operations include the following.
A add: add a line of specified content under the current line.
c replace, replace the selected line with the specified content
d delete, delete the selected row
i insert, insert a row of specified content above the selected row
p print. If a line is specified at the same time, it means to print the specified line; If no line is specified, all contents will be printed; If there are non printing characters, they are output in ASCII code. It is usually used with the "- n" option
s replace, replace the specified character, y character conversion
3. Usage examples
Output all content
[root@localhost /home]#sed -n 'p' test.txt he was short and fat. He was wearing a blue polo shirt with black pants. The home of Football on BBC Sport online. ...... #Equivalent to using cat to view [root@localhost /home]#cat test.txt he was short and fat. He was wearing a blue polo shirt with black pants. The home of Football on BBC Sport online. ......
Output the contents of n lines
[root@localhost /home]#sed -n '3p' test.txt #Output the contents of the third line The home of Football on BBC Sport online. [root@localhost /home]#sed -n '3,5p' test.txt #Output three to five lines The home of Football on BBC Sport online. the tongue is boneless but it breaks bones.12! google is the best tools for search keyword.
Output odd and even lines
[the external chain image transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-ybgi6et0-1644840758) (C: \ users \ Xiaozhu \ appdata \ roaming \ typora user images \ 164444517171. PNG)]
[the external chain image transfer fails. The source station may have anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-l8p7zyu9-164484507584) (C: \ users \ Xiaozhu \ appdata \ roaming \ typora \ typora user images \ 164484456970. PNG)]
[root@localhost /home]#sed -n '1,5{p;n}' test.txt #Output odd lines of 1 ~ 5 lines he was short and fat. The home of Football on BBC Sport online. google is the best tools for search keyword.
[the external chain image transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-unmngqgt-1644845070585) (C: \ users \ Xiaozhu \ appdata \ roaming \ typora user images \ 1644844605187. PNG)]
Output a row that contains something to find
[root@localhost /home]#sed -n '4,/the/p' test.txt #Output the first line containing the from the fourth line the tongue is boneless but it breaks bones.12! google is the best tools for search keyword. [root@localhost /home]#sed -n '/the/=' test.txt #Output the line number of the line containing the 4 5 6
delete
[root@localhost /home]#nl test.txt #nl indicates the number of lines used by the command line to calculate the file 1 he was short and fat. 2 He was wearing a blue polo shirt with black pants. 3 The home of Football on BBC Sport online. 4 the tongue is boneless but it breaks bones.12! 5 google is the best tools for search keyword. ...... [root@localhost /home]#nl test.txt |sed '4d' #Delete the fourth line 1 he was short and fat. 2 He was wearing a blue polo shirt with black pants. 3 The home of Football on BBC Sport online. 5 google is the best tools for search keyword. ...... [root@localhost /home]#nl test.txt |sed '4,7d' #Delete lines 4-7 1 he was short and fat. 2 He was wearing a blue polo shirt with black pants. 3 The home of Football on BBC Sport online. 8 a wood cross! ...... [root@localhost /home]#sed '/^[a-z]/d' test.txt #Delete lines starting with a through z He was wearing a blue polo shirt with black pants. The home of Football on BBC Sport online. The year ahead will test our political establishment to the limit. PI=3.141592653589793238462643383249901429 ...... [root@localhost /home]#sed '/^$/d' test.txt #Delete empty lines
replace
[root@localhost /home]#sed 's/the/THE/' test.txt #Convert all lowercase the to uppercase the he was short and fat. He was wearing a blue polo shirt with black pants. The home of Football on BBC Sport online. THE tongue is boneless but it breaks bones.12! [root@localhost /home]#sed 's/the/THE/g' test.txt #Replace the with the in all lines [root@localhost /home]#sed 's/o//g' test.txt # replace o in all lines with null he was shrt and fat. He was wearing a blue pl shirt with black pants. [root@localhost /home]#sed 's/^/#/' test.txt #Insert at the beginning of each line#number #he was short and fat. #He was wearing a blue polo shirt with black pants. #The home of Football on BBC Sport online. #the tongue is boneless but it breaks bones.12! [root@localhost /home]#sed '/the/s/^/#/' test.txt #Each line the Add before# #the tongue is boneless but it breaks bones.12! #The year ahead will test our political establishment to the limit. [root@localhost /home]#sed '3,5s/the/THE/g' test.txt #Replace the in lines 3 to 5 with the he was short and fat. He was wearing a blue polo shirt with black pants. The home of Football on BBC Sport online. THE tongue is boneless but it breaks bones.12! google is THE best tools for search keyword. [root@localhost /home]#sed '/the/s/o/O/g' test.txt #Replace the lowercase o in the line with the with the uppercase o he was short and fat. He was wearing a blue polo shirt with black pants. The home of Football on BBC Sport online. the tOngue is bOneless but it breaks bOnes.12!
Migrate eligible text
Common parameters are as follows:
parameter | explain |
---|---|
H | copy to clipboard |
g,G | Overwrites and appends the data in the clipboard to the specified row |
w | Save as file |
r | Read the specified file |
a | Append specified content |
[root@localhost /home]#sed '/the/{H;d};$G' test.txt #Cut all lines containing the to the last line he was short and fat. a wood cross! the tongue is boneless but it breaks bones.12! google is the best tools for search keyword. The year ahead will test our political establishment to the limit. [root@localhost /home]#sed '1,5{H;d};17G' test.txt #Cut lines 1 to 5 to the end [root@localhost /home]#sed '/the/w out.file' test.txt #Save all lines containing the to another file He was wearing a blue polo shirt with black pants. The home of Football on BBC Sport online. the tongue is boneless but it breaks bones.12! ...... [root@localhost /home]#cat out.file the tongue is boneless but it breaks bones.12! google is the best tools for search keyword. The year ahead will test our political establishment to the limit. [root@localhost /home]#sed '/the/r /etc/hostname' test.txt the tongue is boneless but it breaks bones.12! localhost.localdomain google is the best tools for search keyword. localhost.localdomain The year ahead will test our political establishment to the limit. localhost.localdomain PI=3.141592653589793238462643383249901429 a wood cross! [root@localhost /home]#hostname #Put the contents of hostname under etc into test Txt under the line localhost.localdomain [root@localhost /home]#sed '3aNEW' test.txt he was short and fat. He was wearing a blue polo shirt with black pants. The home of Football on BBC Sport online. NEW [root@localhost /home]#sed '3aNEW1\nNEW2\nNEW3' test.txt #Insert multiple lines in the third line he was short and fat. He was wearing a blue polo shirt with black pants. The home of Football on BBC Sport online. NEW1 NEW2 NEW3
Call script
[root@localhost /home]#sed '1,5{H;d};18G' test.txt I bet this place is really spooky late at night! Misfortunes never come alone/single. tast he was short and fat. He was wearing a blue polo shirt with black pants. The home of Football on BBC Sport online. the tongue is boneless but it breaks bones.12! google is the best tools for search keyword. [root@localhost /home]#vim open.list 1,5H 1,5d 18G [root@localhost /home]#sed -f open.list test.txt I bet this place is really spooky late at night! Misfortunes never come alone/single. tast he was short and fat. He was wearing a blue polo shirt with black pants. The home of Football on BBC Sport online. the tongue is boneless but it breaks bones.12! google is the best tools for search keyword.
3, awk tool
1.awk overview
Awk is a language for processing text files and a powerful text analysis tool. It is a programming language specially designed for text processing. It is also a line processing software. It is usually used for scanning, filtering and statistical summary. The working data can come from standard input or pipeline or file
2. Working principle
Read the text line by line. By default, it is separated by space or tab key. Save the separated fields to the built-in variable, and execute the editing command according to the mode or condition. The sed command is often used to process a whole line, while awk prefers to divide a line into multiple "fields" and then process it. The reading of awk information is also read line by line. The execution result can be printed and displayed through the function of print. In the process of using the awk command, you can use the logical operators "&" represents "and", "represents" or ","! "Representing non" can also carry out simple mathematical operations, such as +, one, *, /,%, ^ respectively representing addition, subtraction, multiplication, division, remainder and power.
3.Awk basic grammar
awk [option] 'Mode or condition{Editing instructions}' File 1 file 2 awk -f Script file 1 file 2
In the Awk statement, the mode part determines when to operate the data. If omitted, the subsequent actions will remain in the execution state at all times. The mode can be conditional statement, conforming statement or regular expression.
Each editing instruction can contain multiple statements, and multiple {} areas separated by semicolons or spaces between multiple statements
Common option - F defines the field separator. By default, space or tab is used as the separator
The common built-in variables of Awk are as follows
variable | describe |
---|---|
FS | Specifies the field separator for each line of text, which defaults to spaces or tab stops |
NF | Number of fields in the currently processed row |
NR | The line number (ordinal number) of the currently processed line |
$0 | The entire line content of the currently processed line |
$n | The nth field of the current processing line (column n) |
FILENAME | File name to be processed |
RS | Data records are separated. The default is \ n, that is, one record for each line |
Print text content
[root@localhost /home]#awk '/^the/{print}' test.txt #Output lines starting with the the tongue is boneless but it breaks bones.12! [root@localhost /home]#awk '/night!$/{print}' test.txt #Output in night! Line ending with I bet this place is really spooky late at night! [root@localhost /home]#awk -F ':' '/bash$/{print|"wc -l"}' /etc/passwd 5 #Count the number of users who can log in to the system [root@localhost /home]#awk 'NR==1,NR==4{print}' test.txt #Output lines 1 to 4 he was short and fat. He was wearing a blue polo shirt with black pants. The home of Football on BBC Sport online. the tongue is boneless but it breaks bones.12! [root@localhost /home]#awk 'NR==1||NR==4{print}' test.txt he was short and fat. #Output lines 1 and 4 the tongue is boneless but it breaks bones.12! [root@localhost /home]#awk '(NR>=1)&&(NR<=4){print}' test.txt he was short and fat. #Output lines 1 to 4 He was wearing a blue polo shirt with black pants. The home of Football on BBC Sport online. the tongue is boneless but it breaks bones.12! [root@localhost /home]#awk '(NR%2)==1{print}' test.txt #Output the contents of all odd rows [root@localhost /home]#awk '(NR%2)==0{print}' test.txt #Output the contents of all even lines [root@localhost /etc]#awk -F : '!($3<900)' /etc/passwd #Output field no less than the 3rd line of 900 polkitd:x:999:997:User for polkitd:/:/sbin/nologin #“!” Indicates negation libstoragemgmt:x:998:995:daemon account for libstoragemgmt:/var/run/lsm:/sbin/nologin colord:x:997:994:User for colord:/var/lib/colord:/sbin/nologin ...... #When using awk procedure, you can use conditional expression. The operation of conditional expression involves two symbols, colon and question mark #Its essence is if The shortcut of else statement has the advantages of and if Else the same result. [root@localhost /etc]#awk -F : '{if($3>200) {print $0}}' /etc/passwd polkitd:x:999:997:User for polkitd:/:/sbin/nologin #Output the row with the third field greater than 200 libstoragemgmt:x:998:995:daemon account for libstoragemgmt:/var/run/lsm:/sbin/nologin colord:x:997:994:User for colord:/var/lib/colord:/sbin/nologin [root@localhost /etc]#awk -F : '{max=($3 > $4) ? $3:$4;print max}' /etc/passwd 0 1 2 4 7 #If the value of the third field is greater than the value of the fourth field, assign the value of the expression before the question mark to max, otherwise assign the value after the colon to max [root@localhost /etc]#awk -F : '{max=($3 > 200) ? $3:$1;print max}' /etc/passwd root bin daemon adm #If the value of the third field is greater than 200, assign the value of the third field to max, otherwise assign the value of the first field to max .......
Output text by field
#Output the line number of the processed data. After each record is processed, the NR value is increased by 1 [root@localhost /etc]#awk -F : '{print NR,$0}' /etc/passwd 1 root:x:0:0:root:/root:/bin/bash 2 bin:x:1:1:bin:/bin:/sbin/nologin 3 daemon:x:2:2:daemon:/sbin:/sbin/nologin ...... #Output column 1 and column 3 data with column 3 less than 5 [root@localhost /etc]#awk -F ":" '$3 < 5 {print $1 $3 }' /etc/passwd root0 bin1 daemon2 adm3 lp4 #The output contains seven fields, and the first field contains the contents of the first and second fields of the root row [root@localhost /etc]#awk -F ":" '($1~"root")&&(NF==7){print$1,$3}' /etc/passwd root 0 #Output the data of columns 1 and 7 separated by colon in rows 3 to 7 [root@localhost /etc]#awk -F ":" 'NR==3,NR==7 {print $1,$7 }' /etc/passwd daemon /sbin/nologin adm /sbin/nologin lp /sbin/nologin sync /bin/sync shutdown /sbin/shutdown #Output column 1 and column 3 data separated by colon [root@localhost /etc]#awk -F ":" '{print $1,$3}' /etc/passwd root 0 bin 1 daemon 2 perhaps [root@localhost /etc]#awk 'BEGIN {FS=":"}{print$1,$3}' /etc/passwd root 0 bin 1 daemon 2 #Count the number of lines ending in / bin/bash root@localhost /etc]#awk 'BEGIN{x=0};/\/bin\/bash$/ {x++};END{print x}' /etc/passwd 2 #$0 displays the entire line [root@localhost /etc]#awk 'BEGIN{x=0};/\/bin\/bash$/{x++;{print x,$0}};END {print x}' /etc/passwd 1 root:x:0:0:root:/root:/bin/bash 2 gulei:x:1000:1000:GuLei:/home/gulei:/bin/bash
- Awk execution sequence:
① First perform the operation in BEGIN {}
② Then read the data line by line from the specified file, automatically update the values of built-in variables such as NF, NR, $0 and $1, and execute 'mode or condition {editing instruction}'
③ Finally, perform the subsequent operations in END {} - Awk can also use pipe symbols to process the results of commands
[root@localhost /etc]#date |awk '{print "Month: "$2 "\nYear: ",$6}' Month: 09 month Year: CST
summary
- Sed and Awk are excellent text processing tools that rely on regular expressions and can perform specific operations on specified text data
};//bin/bash$/ {x++};END{print x}' /etc/passwd
2
#KaTeX parse error: Expected 'EOF', got '#' at position 31: ...localhost /etc]#̲awk 'BEGIN{x=0}.../{x++;{print x,$0}};END {print x}' /etc/passwd
1 root❌0:0:root:/root:/bin/bash
2 gulei❌1000:1000:GuLei:/home/gulei:/bin/bash
- **Awk Order of execution:** ① Execute first BEGIN{ }Operations in ② Then read the data line by line from the specified file and update it automatically NF,NR,$0,$1 Wait for the value of the built-in variable to execute'Mode or condition{Edit instruction}' ③ Final execution END{ }Subsequent actions in - **Awk You can also use pipe symbols to process the results of commands**
[root@localhost /etc]#date |awk '{print "Month: "$2 "\nYear: ",$6}'
Month: September
Year: CST
## summary - Sed And Awk It is an excellent text processing tool that relies on regular expressions and can perform specific operations on specified text data - Awk It is suitable for text extraction, Sed It is more suitable for editing text