Regular expression and text editor for Shell programming

Posted by Lord Brar on Mon, 14 Feb 2022 14:41:57 +0100

1, Regular expression

1. Regular expression overview

It is usually used to check whether a string satisfies a certain format in a judgment statement

Regular expressions are composed of ordinary characters and metacharacters

Ordinary characters include upper and lower case letters, numbers, punctuation marks and some other symbols

Metacharacter is a special character with special meaning in regular expression, which can be used to specify the occurrence mode of its leading character (that is, the character in front of metacharacter) in the target object

There are two regular expression engines commonly used in Linux

① Basic regular expression: BRE

② Extended regular expression: ERE

Text processing toolsBasic regular expressionExtended regular expression
vi editorsupport\
grepsupport\
egrepsupportsupport
sedsupport\
awksupportsupport

2. Basic regular expression

  • The basic regular expression is a common regular expression part. The common metacharacters and functions are shown in the following table:
Metacharactereffect
\Escape character, used to cancel the meaning of special symbols, such as: \!, \n
^The starting position of the matching string, such as: ^ world matches the line starting with world
$Match the end of the string, for example: world $matches the line ending in world
.Match any 1 character except \ n (line feed)
*Match the front sub expression 0 or more times
[list]Match a character in the list, such as: [0-9] match any digit
[^list]Match any 1 character not in the list, for example: [^ 0-9] match any non numeric character
\ {n \ }Match the previous subexpression n times, for example: [0-9] {2 \} matches two digits
\ {n,\ }Match the previous sub expression no less than n times, for example: [0-9] \ {2, \} represents two or more digits
\ {n,m\ }Match the previous subexpression n to m times, such as: [a-z]\ {2,3 \} matching two to three lowercase letters
  • Note that when egrep and awk use {n}, {n,}, {n,m} to match, "{}" does not need to be preceded by "\"
  • Examples
[root@localhost /home]#grep -n "^the" test.txt           #Filter lines starting with the
4:the tongue is boneless but it breaks bones.12!

[root@localhost /home]#grep -n "words$" test.txt         #Filter lines ending with words
9:Actions speak louder than words

[root@localhost /home]#grep 'sh[io]rt' test.txt          #Filter rows containing the string short or short
he was short and fat.
He was wearing a blue polo shirt with black pants.

[root@localhost /home]#grep '[0-9]' test.txt             #Filter out rows with numbers 0 to 9
the tongue is boneless but it breaks bones.12!
PI=3.141592653589793238462643383249901429

[root@localhost /home]#grep '[a-c]' test.txt             #Filter out rows containing the letters a through c, excluding uppercase
he was short and fat.
He was wearing a blue polo shirt with black pants.

[root@localhost /home]#grep -i '[a-c]' test.txt          #The grep -i option ignores case
he was short and fat.
He was wearing a blue polo shirt with black pants.
The home of Football on BBC Sport online.
Actions speak louder than words
#AxyzxyzxyzxyzC

[root@localhost /home]#grep -n '[^a-z]oo' test.txt       #Filter oo lines that are not preceded by lowercase letters
3:The home of Football on BBC Sport online.
[root@localhost /home]#grep -n '^[a-z]oo' test.txt       #Filter oo lines preceded by lowercase letters
5:google is the best tools for search keyword.


[root@localhost /home]#grep -n 'o\{2\}' test.txt         #Find lines where the letter o appears twice or more
3:The home of Football on BBC Sport online.
5:google is the best tools for search keyword.
8:a wood cross!
12:#woood #
13:#woooooood
15:I bet this place is really spooky late at night!

[root@localhost /home]#grep -n 'wo\{2,5\}d' test.txt     #Find lines with 2 ~ 5 o's starting with w and ending with d
8:a wood cross!
12:#woood #

[root@localhost /home]#grep -n 'wo\{2,\}d' test.txt      #Find a line with two or more o's starting with w and ending with d
8:a wood cross!
12:#woood #
13:#woooooood

[root@localhost /home]#grep o* test.txt                 #Match everything, including blank lines
he was short and fat.
He was wearing a blue polo shirt with black pants.
The home of Football on BBC Sport online.
the tongue is boneless but it breaks bones.12!
google is the best tools for search keyword.
The year ahead will test our political establishment to the limit.

[root@localhost /home]#grep -n 'oo*' test.txt          #Match rows with at least one o
                                                #The first o must exist, and the second o can appear 0 or more times
1:he was short and fat.
2:He was wearing a blue polo shirt with black pants.
3:The home of Football on BBC Sport online.
4:the tongue is boneless but it breaks bones.12!
5:google is the best tools for search keyword.
6:The year ahead will test our political establishment to the limit.

3. Extended regular expression

  • Extended regular expression is the extension and deepening of basic regular expression
  • There are tools and support for awegek
  • Extended regular expression metacharacter

Metacharacter function

+Match the front sub expression more than 1 time, for example: go+d, at least one o will be matched

? Match the front sub expression 0 or 1 times, such as: go?d. Will match gd or god

() take the string in parentheses as a whole, such as g(oo) +d, which will match. More than one time as a whole, such as good, good, etc

I matches the string of the note in the way of or, for example: good I great, which will match good or great
Examples

[root@localhost /home]#egrep -n 't(a|e)st' test.txt        #Match the row containing tast or test
6:The year ahead will test our political establishment to the limit.
17:tast
         
[root@localhost /home]#egrep 'A(xyz)+C' test.txt           #Match the line beginning with A and ending with xyz string in the middle
#AxyzxyzxyzxyzC

[root@localhost /home]#egrep 'wo+d' test.txt               #Match rows with at least one o
a wood cross!
#woood #
#woooooood

2, Sed tool

1.Sed overview

sed is a text processor, which relies on regular expressions. It can read text content and add, delete and replace data according to specified conditions. It is widely used in shell scripts to complete automatic processing tasks.

When processing data, sed does not directly modify the source file by default, but stores the currently processed line in the temporary buffer. All instructions operate in the buffer. After processing, the content of the buffer is output to the screen by default, and then the content of the next line is processed. In this way, it is repeated continuously until the end of the file, The contents of the file itself have not changed (unless the output is stored by redirection)

2. Sed basic syntax

  • sed command format
sed [option] 'operation' parameter
sed [option] -f Script parameters

Common sed command options

optionexplain
-e or – expression=Represents a text file that processes input with a specified command or script
-f or – fileIndicates that the input text file is processed with the specified script file
-h or – helpShow help
-iEdit text files directly
-n. – quiet or silentIndicates that only the processed results are displayed

"Operation" is used to specify the action behavior of file operation, that is, the command of sed. Generally, it is in the format of "[n1[,n2]]" operation parameter. n1 and n2 are optional and represent the number of lines selected for operation. If the operation needs to be carried out between 5 and 20 lines, it is expressed as "5,20 action behavior". Common operations include the following.
A add: add a line of specified content under the current line.

c replace, replace the selected line with the specified content

d delete, delete the selected row

i insert, insert a row of specified content above the selected row

p print. If a line is specified at the same time, it means to print the specified line; If no line is specified, all contents will be printed; If there are non printing characters, they are output in ASCII code. It is usually used with the "- n" option

s replace, replace the specified character, y character conversion

3. Usage examples

Output all content

[root@localhost /home]#sed -n 'p' test.txt 
he was short and fat.
He was wearing a blue polo shirt with black pants.
The home of Football on BBC Sport online.
......
#Equivalent to using cat to view
[root@localhost /home]#cat test.txt 
he was short and fat.
He was wearing a blue polo shirt with black pants.
The home of Football on BBC Sport online.
......

Output the contents of n lines

[root@localhost /home]#sed -n '3p' test.txt      #Output the contents of the third line
The home of Football on BBC Sport online.
[root@localhost /home]#sed -n '3,5p' test.txt    #Output three to five lines
The home of Football on BBC Sport online.
the tongue is boneless but it breaks bones.12!
google is the best tools for search keyword.

Output odd and even lines

[the external chain image transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-ybgi6et0-1644840758) (C: \ users \ Xiaozhu \ appdata \ roaming \ typora user images \ 164444517171. PNG)]

[the external chain image transfer fails. The source station may have anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-l8p7zyu9-164484507584) (C: \ users \ Xiaozhu \ appdata \ roaming \ typora \ typora user images \ 164484456970. PNG)]

[root@localhost /home]#sed -n '1,5{p;n}' test.txt    #Output odd lines of 1 ~ 5 lines
he was short and fat.
The home of Football on BBC Sport online.
google is the best tools for search keyword.

[the external chain image transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-unmngqgt-1644845070585) (C: \ users \ Xiaozhu \ appdata \ roaming \ typora user images \ 1644844605187. PNG)]

Output a row that contains something to find

[root@localhost /home]#sed -n '4,/the/p' test.txt         #Output the first line containing the from the fourth line
the tongue is boneless but it breaks bones.12!
google is the best tools for search keyword.

[root@localhost /home]#sed -n '/the/=' test.txt           #Output the line number of the line containing the
4
5
6

delete

[root@localhost /home]#nl test.txt                   #nl indicates the number of lines used by the command line to calculate the file
     1	he was short and fat.
     2	He was wearing a blue polo shirt with black pants.
     3	The home of Football on BBC Sport online.
     4	the tongue is boneless but it breaks bones.12!
     5	google is the best tools for search keyword.
......
       
[root@localhost /home]#nl test.txt |sed '4d'         #Delete the fourth line
     1	he was short and fat.
     2	He was wearing a blue polo shirt with black pants.
     3	The home of Football on BBC Sport online.
     5	google is the best tools for search keyword.
...... 

[root@localhost /home]#nl test.txt |sed '4,7d'       #Delete lines 4-7
     1	he was short and fat.
     2	He was wearing a blue polo shirt with black pants.
     3	The home of Football on BBC Sport online.
     8	a wood cross!
......

[root@localhost /home]#sed '/^[a-z]/d' test.txt     #Delete lines starting with a through z
He was wearing a blue polo shirt with black pants.
The home of Football on BBC Sport online.
The year ahead will test our political establishment to the limit.
PI=3.141592653589793238462643383249901429
......

[root@localhost /home]#sed '/^$/d' test.txt         #Delete empty lines

replace

[root@localhost /home]#sed 's/the/THE/' test.txt     #Convert all lowercase the to uppercase the
he was short and fat.
He was wearing a blue polo shirt with black pants.
The home of Football on BBC Sport online.
THE tongue is boneless but it breaks bones.12!

[root@localhost /home]#sed 's/the/THE/g' test.txt    #Replace the with the in all lines

[root@localhost /home]#sed 's/o//g' test.txt # replace o in all lines with null
he was shrt and fat.
He was wearing a blue pl shirt with black pants.

[root@localhost /home]#sed 's/^/#/' test.txt         #Insert at the beginning of each line#number
#he was short and fat.
#He was wearing a blue polo shirt with black pants.
#The home of Football on BBC Sport online.
#the tongue is boneless but it breaks bones.12!

[root@localhost /home]#sed '/the/s/^/#/' test.txt    #Each line the Add before#
#the tongue is boneless but it breaks bones.12!
#The year ahead will test our political establishment to the limit.

[root@localhost /home]#sed '3,5s/the/THE/g' test.txt  #Replace the in lines 3 to 5 with the
he was short and fat.
He was wearing a blue polo shirt with black pants.
The home of Football on BBC Sport online.
THE tongue is boneless but it breaks bones.12!
google is THE best tools for search keyword.

[root@localhost /home]#sed '/the/s/o/O/g' test.txt    #Replace the lowercase o in the line with the with the uppercase o
he was short and fat.
He was wearing a blue polo shirt with black pants.
The home of Football on BBC Sport online.
the tOngue is bOneless but it breaks bOnes.12!

Migrate eligible text
Common parameters are as follows:

parameterexplain
Hcopy to clipboard
g,GOverwrites and appends the data in the clipboard to the specified row
wSave as file
rRead the specified file
aAppend specified content
[root@localhost /home]#sed '/the/{H;d};$G' test.txt     #Cut all lines containing the to the last line
he was short and fat.
a wood cross!

the tongue is boneless but it breaks bones.12!
google is the best tools for search keyword.
The year ahead will test our political establishment to the limit.

[root@localhost /home]#sed '1,5{H;d};17G' test.txt      #Cut lines 1 to 5 to the end

[root@localhost /home]#sed '/the/w out.file' test.txt   #Save all lines containing the to another file
He was wearing a blue polo shirt with black pants.
The home of Football on BBC Sport online.
the tongue is boneless but it breaks bones.12!
......

[root@localhost /home]#cat out.file 
the tongue is boneless but it breaks bones.12!
google is the best tools for search keyword.
The year ahead will test our political establishment to the limit.

[root@localhost /home]#sed '/the/r /etc/hostname' test.txt 
the tongue is boneless but it breaks bones.12!
localhost.localdomain
google is the best tools for search keyword.
localhost.localdomain
The year ahead will test our political establishment to the limit.
localhost.localdomain
PI=3.141592653589793238462643383249901429
a wood cross!

[root@localhost /home]#hostname            #Put the contents of hostname under etc into test Txt under the line
localhost.localdomain

[root@localhost /home]#sed '3aNEW' test.txt        
he was short and fat.
He was wearing a blue polo shirt with black pants.
The home of Football on BBC Sport online.
NEW

[root@localhost /home]#sed '3aNEW1\nNEW2\nNEW3' test.txt    #Insert multiple lines in the third line
he was short and fat.
He was wearing a blue polo shirt with black pants.
The home of Football on BBC Sport online.
NEW1
NEW2
NEW3

Call script

[root@localhost /home]#sed '1,5{H;d};18G' test.txt 
I bet this place is really spooky late at night!
Misfortunes never come alone/single.
tast


he was short and fat.
He was wearing a blue polo shirt with black pants.
The home of Football on BBC Sport online.
the tongue is boneless but it breaks bones.12!
google is the best tools for search keyword.


[root@localhost /home]#vim open.list

1,5H
1,5d
18G

[root@localhost /home]#sed -f open.list test.txt 
I bet this place is really spooky late at night!
Misfortunes never come alone/single.
tast


he was short and fat.
He was wearing a blue polo shirt with black pants.
The home of Football on BBC Sport online.
the tongue is boneless but it breaks bones.12!
google is the best tools for search keyword.

3, awk tool

1.awk overview

Awk is a language for processing text files and a powerful text analysis tool. It is a programming language specially designed for text processing. It is also a line processing software. It is usually used for scanning, filtering and statistical summary. The working data can come from standard input or pipeline or file

2. Working principle

Read the text line by line. By default, it is separated by space or tab key. Save the separated fields to the built-in variable, and execute the editing command according to the mode or condition. The sed command is often used to process a whole line, while awk prefers to divide a line into multiple "fields" and then process it. The reading of awk information is also read line by line. The execution result can be printed and displayed through the function of print. In the process of using the awk command, you can use the logical operators "&" represents "and", "represents" or ","! "Representing non" can also carry out simple mathematical operations, such as +, one, *, /,%, ^ respectively representing addition, subtraction, multiplication, division, remainder and power.

3.Awk basic grammar

awk [option] 'Mode or condition{Editing instructions}' File 1 file 2

awk -f Script file 1 file 2

In the Awk statement, the mode part determines when to operate the data. If omitted, the subsequent actions will remain in the execution state at all times. The mode can be conditional statement, conforming statement or regular expression.

Each editing instruction can contain multiple statements, and multiple {} areas separated by semicolons or spaces between multiple statements

Common option - F defines the field separator. By default, space or tab is used as the separator

The common built-in variables of Awk are as follows

variabledescribe
FSSpecifies the field separator for each line of text, which defaults to spaces or tab stops
NFNumber of fields in the currently processed row
NRThe line number (ordinal number) of the currently processed line
$0The entire line content of the currently processed line
$nThe nth field of the current processing line (column n)
FILENAMEFile name to be processed
RSData records are separated. The default is \ n, that is, one record for each line

Print text content

[root@localhost /home]#awk '/^the/{print}' test.txt      #Output lines starting with the
the tongue is boneless but it breaks bones.12!

[root@localhost /home]#awk '/night!$/{print}' test.txt   #Output in night! Line ending with
I bet this place is really spooky late at night!

[root@localhost /home]#awk -F ':' '/bash$/{print|"wc -l"}' /etc/passwd
5                                                          #Count the number of users who can log in to the system

[root@localhost /home]#awk 'NR==1,NR==4{print}' test.txt   #Output lines 1 to 4
he was short and fat.
He was wearing a blue polo shirt with black pants.
The home of Football on BBC Sport online.
the tongue is boneless but it breaks bones.12!

[root@localhost /home]#awk 'NR==1||NR==4{print}' test.txt  
he was short and fat.                                       #Output lines 1 and 4
the tongue is boneless but it breaks bones.12!

[root@localhost /home]#awk '(NR>=1)&&(NR<=4){print}' test.txt 
he was short and fat.                                      #Output lines 1 to 4
He was wearing a blue polo shirt with black pants.
The home of Football on BBC Sport online.
the tongue is boneless but it breaks bones.12!

[root@localhost /home]#awk '(NR%2)==1{print}' test.txt     #Output the contents of all odd rows

[root@localhost /home]#awk '(NR%2)==0{print}' test.txt     #Output the contents of all even lines

[root@localhost /etc]#awk -F : '!($3<900)' /etc/passwd     #Output field no less than the 3rd line of 900
polkitd:x:999:997:User for polkitd:/:/sbin/nologin         #“!” Indicates negation
libstoragemgmt:x:998:995:daemon account for libstoragemgmt:/var/run/lsm:/sbin/nologin
colord:x:997:994:User for colord:/var/lib/colord:/sbin/nologin
......

#When using awk procedure, you can use conditional expression. The operation of conditional expression involves two symbols, colon and question mark
#Its essence is if The shortcut of else statement has the advantages of and if Else the same result.
[root@localhost /etc]#awk -F : '{if($3>200) {print $0}}' /etc/passwd
polkitd:x:999:997:User for polkitd:/:/sbin/nologin         #Output the row with the third field greater than 200
libstoragemgmt:x:998:995:daemon account for libstoragemgmt:/var/run/lsm:/sbin/nologin
colord:x:997:994:User for colord:/var/lib/colord:/sbin/nologin

[root@localhost /etc]#awk -F : '{max=($3 > $4) ? $3:$4;print max}' /etc/passwd
0  
1
2
4
7
#If the value of the third field is greater than the value of the fourth field, assign the value of the expression before the question mark to max, otherwise assign the value after the colon to max

[root@localhost /etc]#awk -F : '{max=($3 > 200) ? $3:$1;print max}' /etc/passwd
root
bin
daemon
adm
#If the value of the third field is greater than 200, assign the value of the third field to max, otherwise assign the value of the first field to max
.......

Output text by field

#Output the line number of the processed data. After each record is processed, the NR value is increased by 1
[root@localhost /etc]#awk -F : '{print NR,$0}' /etc/passwd
1 root:x:0:0:root:/root:/bin/bash
2 bin:x:1:1:bin:/bin:/sbin/nologin
3 daemon:x:2:2:daemon:/sbin:/sbin/nologin
......

#Output column 1 and column 3 data with column 3 less than 5
[root@localhost /etc]#awk -F ":" '$3 < 5 {print $1 $3 }' /etc/passwd
root0
bin1
daemon2
adm3
lp4

#The output contains seven fields, and the first field contains the contents of the first and second fields of the root row
[root@localhost /etc]#awk -F ":" '($1~"root")&&(NF==7){print$1,$3}' /etc/passwd
root 0

#Output the data of columns 1 and 7 separated by colon in rows 3 to 7
[root@localhost /etc]#awk -F ":" 'NR==3,NR==7 {print $1,$7 }' /etc/passwd
daemon /sbin/nologin
adm /sbin/nologin
lp /sbin/nologin
sync /bin/sync
shutdown /sbin/shutdown

#Output column 1 and column 3 data separated by colon
[root@localhost /etc]#awk -F ":" '{print $1,$3}' /etc/passwd
root 0
bin 1
daemon 2

perhaps

[root@localhost /etc]#awk 'BEGIN {FS=":"}{print$1,$3}' /etc/passwd
root 0
bin 1
daemon 2

#Count the number of lines ending in / bin/bash
root@localhost /etc]#awk 'BEGIN{x=0};/\/bin\/bash$/ {x++};END{print x}' /etc/passwd
2

#$0 displays the entire line
[root@localhost /etc]#awk 'BEGIN{x=0};/\/bin\/bash$/{x++;{print x,$0}};END {print x}' /etc/passwd
1 root:x:0:0:root:/root:/bin/bash
2 gulei:x:1000:1000:GuLei:/home/gulei:/bin/bash
  • Awk execution sequence:
    ① First perform the operation in BEGIN {}
    ② Then read the data line by line from the specified file, automatically update the values of built-in variables such as NF, NR, $0 and $1, and execute 'mode or condition {editing instruction}'
    ③ Finally, perform the subsequent operations in END {}
  • Awk can also use pipe symbols to process the results of commands
[root@localhost /etc]#date |awk '{print "Month: "$2 "\nYear: ",$6}'
Month: 09 month
Year:  CST

summary

  • Sed and Awk are excellent text processing tools that rely on regular expressions and can perform specific operations on specified text data
    };//bin/bash$/ {x++};END{print x}' /etc/passwd
    2

#KaTeX parse error: Expected 'EOF', got '#' at position 31: ...localhost /etc]#̲awk 'BEGIN{x=0}.../{x++;{print x,$0}};END {print x}' /etc/passwd
1 root❌0:0:root:/root:/bin/bash
2 gulei❌1000:1000:GuLei:/home/gulei:/bin/bash

- **Awk Order of execution:**
   ① Execute first BEGIN{ }Operations in
   ② Then read the data line by line from the specified file and update it automatically NF,NR,$0,$1 Wait for the value of the built-in variable to execute'Mode or condition{Edit instruction}'
   ③ Final execution END{ }Subsequent actions in
- **Awk You can also use pipe symbols to process the results of commands**

[root@localhost /etc]#date |awk '{print "Month: "$2 "\nYear: ",$6}'
Month: September
Year: CST

## summary

- Sed And Awk It is an excellent text processing tool that relies on regular expressions and can perform specific operations on specified text data
- Awk It is suitable for text extraction, Sed It is more suitable for editing text

Topics: Front-end Back-end regex