catalogue
1. Regular expression definition
2. Basic regular expression (BRE)
3. Extended regular expression (ERE)
introduction
When you learn to use circular statements and functions, you can use regular expressions flexibly to use circular statements and functions faster and more efficiently, which is convenient for managers.
1, Regular expression
1. Regular expression definition
(1) Regular expression, also known as normal expression and regular expression.
(2) It is usually used to retrieve and replace text that conforms to a certain pattern (rule).
(3) There is not only one regular expression, and different programs in LINUX may use different regular expressions, such as grep sed awk egrep.
(4) Use strings to describe and match a series of strings that conform to a rule.
(5) Regular expression composition:
① Ordinary characters: upper and lower case letters, numbers, punctuation marks and some other characters;
② Metacharacter: a special character with special meaning in a regular expression, which can be used to specify the occurrence mode of its leading character (that is, the character before the metacharacter) in the target object
2. Basic regular expression (BRE)
(1) The basic regular expression is the common regular expression part.
(2) In addition to ordinary characters, the following metacharacters are commonly used:
①\: Escape character, which escapes special characters and ignores their special meaning a\.b matching a.b,But it can't match ajb,.Escaped as special meaning. ②^: Match the beginning of the line,^Is the beginning of the matching string^tux Match to tux The first line. ③$: Match the end of the line, $Is the end of the matching string tux$Match to tux End line. ④.: Match division newline\n Any single character other than, awk Then you can ab.matching abc or bad,Unmatched abcd or abde,Only single characters can be matched. ⑤[]: An example of a word in the match list: go[ola]d,[abc],[a-z],[a-z0-9]. ⑥[^]: Match any word character that is not in the list: [^a-z],[^0-9],[^A-Z0-9]. ⑦*: Match the front sub expression 0 or more times, for example: goo*d,go.*d. ⑧\{n\} :Match previous subexpression n Times, cases:go\{2\}d,'[O-9]\{2\}'Match two digits. ⑨\{n,\}:Match the preceding subexpression with no less than n Times, cases: go\{2,\}d,' [0-9]\{2,\}'Match two or more digits. ⑩\{n,m\}﹔Match previous subexpression n reach m Times, cases: go\{2,3\)d,'[0-9]\{2,3\}'Match two to three digits.
3. Extended regular expression (ERE)
(1) Extended regular expression is an extension and deepening of the basic regular expression.
(2) Extended metacharacter:
①+Function: repeat one or more previous characters. Example: Execution“ egrep -n 'wo+d' test.txt"Command, you can query"wood" "woood" "woooooood"Wait for a string. ②?Function of: zero or the previous character of one. Example: Execution“ egrep -n 'bes?t' test.txt"Command, you can query“ bet""best"These two strings. ③|Function: use or( or)To find multiple characters. Example: Execution“ egrep -n 'of|is|on' test.txt"Command to query"of"perhaps"if"perhaps"on"character string. ④()To find a group string. Example:“ egrep -n 't(a|e)st' test.txt". "tast"And“ test"Because these two words are different“ t"And“ st"Is repeated, so“ a"And“ e". List“()"Among the symbols, and“|"Separate, you can query"tast"perhaps"test"character string ⑤()of+Role: identify multiple duplicate groups. Example:“ egrep -n 'A(xyz)+C' test.txt". The command starts with a query"A"The end is"C",There is more than one in the middle"xyz"The meaning of string.
2, File processor
1,grep
(1) grep command is a text search command. It can search text with regular expressions or use the contents of a file as search keywords.
(2) grep works like this. It searches for string templates in one or more files. If the template includes spaces, it must be referenced, and all strings behind the template are regarded as file names. The search results are sent to standard output without affecting the contents of the original file.
(3) Common options:
-a : take binary File to text Searching for data by means of documents; -c : Calculation found 'Search string' Number of; -i : Ignore the difference of case, so the case is regarded as the same; -n : Output line number; -v : Reverse selection, i.e. no 'Search string' The line of content; --color=auto : You can add the color display to the found keyword part.
[root@localhost ~]# grep -n 'the' test.txt / / swipe the line containing the in the file [root@localhost ~]# grep -vn 'the' test.txt / / swipe the lines in the file that do not contain the [root@localhost ~]# grep -in 'the' test.txt / / swipe the line containing the in the file, case insensitive [root@localhost ~]# grep -n 'sh[io]rt' test.txt / / brush out the lines in the file that contain short or short strings [root@localhost ~]# grep -n 'oo' test.txt / / filter out the lines containing two OOS in the string [root@localhost ~]# grep -n '[^w]oo' test.txt / / filter out lines with two o strings after W [root@localhost ~]# grep -n 'ooo' test.txt / / filter out the lines containing three o strings [root@localhost ~]# grep -n 'ooo*' test.txt / / filter out lines with more than two o strings [root@localhost ~]# grep -n 'woo*d' test.txt / / filter out the lines with at least one o in wd [root@localhost ~]# grep -n '[0-9]*' test.txt / / filter out the lines of strings containing numbers in 0-9 [root@localhost ~]# grep -n 'o\{2\}' test.txt / / filter out lines with two consecutive o strings [root@localhost ~]# grep -n 'wo\{2,5\}d' test.txt [root@localhost ~]# grep -n 'wo\{2,\}d' test.txt [root@localhost ~]# egrep -n 'wo+d' test.txt / / query the line in the file where wd there is one or more strings [root@localhost ~]# egrep -n 'bes?t' test.txt / / query the lines in the file that contain bet or best characters [root@localhost ~]# egrep -n 'of|is|on' test.txt / / query the lines in the file that contain the characters of or is or on [root@localhost ~]# Egrep - n't (a|e) st 'test.txt / / query the lines in the file that contain the tast or test characters [root@localhost ~]# egrep -n 'A(xyz)+C' test.txt / / query the lines in the file that start with A and end with C with more than one xyz character in the middle
2,sed
(1) sed (Stream EDitor) is a powerful and simple text parsing and conversion tool, which can read the text and edit the text content (delete, replace, add, move, etc.) according to the specified conditions sed can also realize quite complex text processing operations without interaction. It is widely used in Shell scripts to complete various automatic processing tasks.
(2) The workflow of sed mainly includes three processes: reading, executing and displaying.
Read: sed reads a line from the input stream (file, pipe, standard input) and stores it in a temporary buffer (also known as pattern space).
Execute: by default, all sed commands are executed sequentially in the mode space. Unless the address of the line is specified, the SED command will be executed successively on all lines.
Display: send the modified content to the output stream. After sending data, the mode space will be cleared.
Before all file contents are processed, the above process will be repeated until all contents are processed.
Note: by default, all sed commands are executed in the mode space, so the input file will not change unless the output is stored by redirection.
(3) Common usage of sed command
There are usually two formats for calling the sed command, as shown below. Among them, "parameter" refers to the target file of the operation. When there are multiple operation objects, the files are separated by commas "," and; The scriptfile represents the script file, which needs to be specified with the "- f" option. When the script file appears before the target file, it means that the input target file is processed through the specified script file.
sed: ① take out the contents of the file line by line and put the data into the mode space; ② Data processing based on matching conditions in pattern space; ③ After processing the data, take the data from the mode space to the screen.
sed [option] 'operation' parameter sed [option] -f scriptfile parameter
(4) Common options
-e: Used when executing multiple commands, and can be omitted when executing one -n: Only the processed lines are output and not displayed when reading in -r: Represents the use of extended regular expressions -i: Edit the file directly without outputting the results -f: Use the specified script to process the input text file Matching pattern a: Append, inserting content after the matching line c: Change, change the content of the entire line that matches the line i: Insert, insert content before matching line/ignore case(Write it at the back) d: Delete, delete matching content s: Replace, replace the matching content p: Print, print out matching content, usually with-n Options and use =: Used to print the line number of the matched line n: Read next line, encountered n Will automatically jump to the next line r,w: Read and write edit commands, r Used to read the contents of other documents into this document, w Used to write matches to other files g: Global operation q:sign out
[root@localhost ~]# sed -n '3p' test.txt / / Line 3 is displayed [root@localhost ~]# sed -n 'p; N 'test. TXT / / odd rows are displayed [root@localhost ~]# sed -n 'n; P 'test. TXT / / even lines are displayed [root@localhost ~]# sed -n '1,5{p;n}' test.txt / / odd rows are displayed [root@localhost ~]# sed -n '5,${p;n}' test.txt / / odd rows are displayed [root@localhost ~]# sed -n '6,${p;n}' test.txt / / even lines are displayed [root@localhost ~]# sed -n '/the/p' test.txt / / the line is displayed [root@localhost ~]# sed -n '4,/the/p' test.txt [root@localhost ~]# sed -n '/the/=' test.txt [root@localhost ~]# sed -n '/^PI/p' test.txt [root@localhost ~]# sed -n '/\<wood\>/p' test.txt [root@localhost ~]# sed -n -e 'p' 1.txt / / all printed contents are displayed [root@localhost ~]# sed -n -e '=' 1.txt / / print line number [root@localhost ~]# sed -n -e 'l' 1.txt / / print all contents, including non printed characters, in ASCII [root@localhost ~]# sed -n -e '=; P '1. TXT / / print the line number and all contents [root@localhost ~]# sed -n -e '=' -e 'p' 1.txt [root@localhost ~]# nl test.txt | sed '3d' // Delete line 3 [root@localhost ~]# nl test.txt | sed '3,5d' // Delete lines 3-5 [root@localhost ~]# nl test.txt | sed '/cross/d' // Delete rows containing cross [root@localhost ~]# nl test.txt | sed '/cross/!d' // Delete rows without cross [root@localhost ~]# sed ‘/^[a-z]/d’ test.txt // Delete lines beginning with lowercase letters [root@localhost ~]# sed '/\.$/d' test.txt // Delete lines ending in [root@localhost ~]# sed '/^$/d' test.txt // Delete empty lines [root@localhost ~]# sed 's/the/THE/' test.txt // Replace the first the in each line with the [root@localhost ~]# sed 's/l/L/2' test.txt // Replace the second L in each line with L [root@localhost ~]# sed 's/the/THE/g' test.txt // Replace all the in the file with the [root@localhost ~]# sed 's/o//g' test.txt // Delete all o in the file (replace with empty string) [root@localhost ~]# sed 's/^/#/' test.txt //Insert # number at the beginning of each line [root@localhost ~]# sed '/the/s/^/#/' test.txt //Insert # number at the beginning of each line containing the [root@localhost ~]# sed 's/$/EOF/' test.txt // Insert the string EOF at the end of each line [root@localhost ~]# sed '3,5s/the/THE/g' test.txt // Replace all the in lines 3-5 with the [root@localhost ~]# sed '/the/s/o/O/g' test.txt // Replace o with O in all rows containing the [root@localhost ~]# sed '/the/{H;d};$G' test.txt // Move the line containing the to the end of the file, {;} is used for multiple operations [root@localhost ~]# sed '1,5{H;d};17G' test.txt // After transferring lines 1-5 to line 17 [root@localhost ~]# sed '/the/w out.file' test.txt // Save the line containing the as the file out.file [root@localhost ~]# sed '/the/r /etc/hostname' test.txt // Add the contents of the file / etc / hostname after each line that contains the [root@localhost ~]# sed '3aNew' test.txt // Insert a New line after line 3 with the content New [root@localhost ~]# sed '/the/aNew' test.txt // After each line containing the, insert a New line with the content New [root@localhost ~]# sed '3aNew\nNew2' test.txt // Insert multiple lines after the third line, and the middle \ n indicates line feed
DNS script [root@localhost ~]# vim BIND.sh #!/bin/bash rpm -q bind > /dev/null if [ $? -ne 0 ];then yum install bind -y &> /dev/null systemctl start named else systemctl start named fi sed -i '13s/127.0.0.1/192.168.32.128/' /etc/named.conf sed -i '21s/localhost/any/' /etc/named.conf sed -i '24azone "qaz.com" IN {\n type master;\n file "qaz.com.zone";\n allow-update { none; };\n };' /etc/named.rfc1912.zones cd /var/named cp -p named.localhost qaz.com.zone sed -i '2s/@ rname.invalid./qaz.com. admin.qaz.com./' qaz.com.zone sed -i '8s/@/qaz.com./' qaz.com.zone sed -i '9s/127.0.0.1/192.168.32.128/' qaz.com.zone sed -i '10d' qaz.com.zone sed -i '9aIN MX 10 mail.qaz.com.\nwww IN A 192.168.32.128\nftp IN A 192.168.32.128' qaz.com.zone systemctl stop firewalld.service setenforce 0 &> /dev/null systemctl restart named echo "nameserver 192.168.32.128" > /etc/resolv.conf :wq [root@localhost ~]# nslookup www.qaz.com Server: 192.168.32.128 Address: 192.168.32.128#53 Name: www.qaz.com Address: 192.168.32.128 [root@localhost ~]# nslookup ftp.qaz.com Server: 192.168.32.128 Address: 192.168.32.128#53 Name: ftp.qaz.com Address: 192.168.32.128
3.awk
(1) Overview
AWK is a language for processing text files and a powerful text analysis tool. It is a programming language specially designed for text processing. It is also a line processing software. It is usually used for scanning, filtering and statistical summary. The data can come from standard input or pipes or files. It was born in Bell Labs in the 1970s. Now CentOS7 uses gawk. It is called AWK because it takes three founders Alfred Aho, Peter Weinberger, And Brian Kernighan's Family Name.
(2) Working principle
When the first row is read, match the condition, then execute the specified action, and then read the data processing of the second row. It will not be output by default. If no matching condition is defined, the default is to match all data rows. awk implies a loop. The action will be executed as many times as the condition is matched.
(3) Command format
awk option 'Mode or condition{operation}' File 1 file 2 ... awk -f Script file 1 file 2 ..
(4) Built in variable
FS: Specifies the field separator for each line of text, which defaults to spaces or tab stops. NF: The number of fields in the currently processed row. NR: The line number (ordinal number) of the currently processed line. $0: The entire line content of the currently processed line. $n: The second row of the current processing line n First field n Column). FILENAME: The name of the file being processed.
[root@localhost ~]# awk '{print}' test.txt / / all contents are output, which is equivalent to cat test.txt [root@localhost ~]# awk '{print $0}' test.txt / / output all contents [root@localhost ~]# awk 'NR==1,NR==3{print}' test.txt / / output lines 1 to 3 [root@localhost ~]# awk '(NR > = 1) & & (NR < = 3) {print}' test.txt / / output lines 1 to 3 [root@localhost ~]# awk 'NR = = 1 | NR = = 3 {print}' test.txt / / output the contents of line 1 and line 3 [root@localhost ~]# awk '(NR%2)==1{print}' test.txt / / output odd lines [root@localhost ~]# awk '(NR%2)==0{print}' test.txt / / output even lines [root@localhost ~]# awk '/ ^ root/{print}' / etc/passwd / / output lines starting with root [root@localhost ~]# awk '/ nologin$/{print}' / etc/passwd / / output lines ending with nologin [root@localhost ~]# awk ‘BEGIN {x=0};/\/bin\/bash$/{x++};END {printx} '/ etc/passwd / / count the contents ending in / bin/bash [root@localhost ~]# awk ‘BEGIN{RS=””}; END{print NR} 'test.txt / / count the number of text paragraphs separated by blank lines [root@localhost ~]# awk '{print $3}' test.txt / / output the third field of each line [root@localhost ~]# awk '{print $1, $3}' test.txt / / output the first and third fields of each line [root@localhost ~]# Awk - F: '$2 = = "{print}' / etc/shadow / / output the user whose password is empty [root@localhost ~]# awk ‘BEGIN {FS=”:”};$ 2 = = "{print} '/ etc/shadow / / output the user whose password is empty [root@localhost ~]# Awk - F: '$7 ~' / bash '{print $1}' / etc/passwd / / the output is colon separated and the first field of the line containing / bash in the seventh field [root@localhost ~]# Awk - F: '($7! = "/ bin/bash") & & ($7! = "/ sbin/nologin") {print}' / etc/passwd / / output the seventh field, which is not / bin/bash or all lines of / sbin/nologin [root@localhost ~]# awk '($1 ~ "nfs") & & (NF = = 8) {print $1, $2}' / etc/services / / the output contains the first and second fields of the row with 8 fields and nfs in the first field [root@localhost ~]# Awk - F: '/ bash$/{print | "WC - L"}' / etc/passwd / / call the WC - L command to count the number of bash used, which is equivalent to grep - C "bash $" / etc/passwd [root@localhost ~]# awk 'begin {while ("W" | getline) n + +; {print n-2}}' / / call the w command to count the number of online users [root@localhost ~]# awk 'BEGIN {"hostname" | getline; print $0}' / / call hostname and output the current hostname
[root@localhost /]# vim FW1.sh #!/bin/bash x=`awk '/Failed password/{ip[$11]++}END{for(i in ip){print i ","ip[1]}}' /var/log/secure` for j in $x do ip=`echo $j | awk -F "," '{print $1}'` num=`echo $j | awk-F "," '{print $2]'` if [ $num -ge 3 ] ; then echo "Attention! $ip Failed to access this computer $num Times, please deal with it as soon as possible!" fi done
3, Common file tools
1,cut
(1) Instructions for use: the cut command cuts bytes, characters and fields from each line of the File and writes these bytes, characters and fields to standard output. If you do not specify the File parameter, the cut command reads the standard input. One of the - b, - c, or - f flags must be specified, and cut is only good at dealing with text separated by a single character.
(2) Options
-b: Intercept by byte; -c: Intercept by character, commonly used in Chinese; -d: Specify what to use as the separator to intercept, and the default is tab; -f: Usually and-d Together.
[root@localhost ~]# cat /etc/passwd | cut -d':' -f 1 root bin daemon adm lp [root@localhost ~]# cat /etc/passwd | cut -d':' -f 3 0 1 2 3 4 [root@localhost ~]# cat /etc/passwd | cut -d':' -f1,3 root:0 bin:1 daemon:2 adm:3 lp:4 [root@localhost ~]# who | cut -b 3 o [root@localhost ~]# who | cut -c 3 o
2,sort
(1) It is a tool for sorting file contents in behavioral units. It can also be sorted according to different data types. For example, the sorting of data and characters is different.
(2) Syntax: sort [options] parameter
(3) Common options
-t: Specifies the separator, which is used by default[Tab]Key or space separation -k: Specify the sorting area and which interval to sort -n: Sort by number. By default, sort by text -u: Equivalent to uniq,It means that only one row of the same data is displayed. Note: if there are spaces at the end of the row, de duplication will not succeed -r: Reverse sort, ascending by default,-r It's descending -o: Transfer the sorted results to the specified file
[root@localhost etc]# sort passwd / / without any options, it is in ascending order in the first column by default. The letters are displayed from a to z from top to bottom abrt:x:173:173::/etc/abrt:/sbin/nologin adm:x:3:4:adm:/var/adm:/sbin/nologin apache:x:48:48:Apache:/usr/share/httpd:/sbin/nologin avahi:x:70:70:Avahi mDNS/DNS-SD Stack:/var/run/avahi-daemon:/sbin/nologi nbin:x:1:1:bin:/bin:/sbin/nologin chrony:x:993:988::/var/lib/chrony:/sbin/nologin colord:x:997:994:User for colord:/var/lib/colord:/sbin/nologin [root@localhost etc]# sort -n -t: -k3 passwd / / sort the third column in numeric size (ascending order) with colon as separator root:x:0:0:root:/root:/bin/bash bin:x:1:1:bin:/bin:/sbin/nologin daemon:x:2:2:daemon:/sbin:/sbin/nologin adm:x:3:4:adm:/var/adm:/sbin/nologin lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin [root@localhost etc]# sort -nr -t: -k3 passwd / / sort the third column in numeric size (descending order) with colon as separator nfsnobody:x:65534:65534:Anonymous NFS User:/var/lib/nfs:/sbin/nologin polkitd:x:999:998:User for polkitd:/:/sbin/nologin libstoragemgmt:x:998:995:daemon account for libstoragemgmt:/var/run/lsm: /sbin/nologincolord:x:997:994:User for colord:/var/lib/colord:/sbin/nologin gluster:x:996:993:GlusterFS daemons:/run/gluster:/sbin/nologin [root@localhost etc]# sort -u passwd / / remove duplicate lines in the file (duplicate lines can be discontinuous)
3,uniq
(1) It is mainly used to remove continuous duplicate lines
Note: it is a continuous row, so it is usually used in combination with sort. First sort it to make it a continuous row, and then perform the de duplication operation. Otherwise, the discontinuous repeated rows cannot be de duplicated.
(2) Syntax: uniq [options] parameter
(3) Common options
-c: Count duplicate rows; -d: Show only duplicate lines; -u: Show rows that appear only once
[root@localhost /]# cat fruit | uniq -c / / count the number of duplicate lines. Discontinuous duplicate lines are not counted as duplicate lines 2 apple 1 peache 1 pear 1 banana 2 cherry 1 banana 1 orange [root@localhost /]# cat fruit | sort | uniq -c / / used in combination with sort is the effect we want 2 apple 2 banana 2 cherry 1 orange 1 peache 1 pear [root@localhost /]# cat fruit | sort | uniq -d / / used in combination with sort to filter out duplicate lines apple banana cherry [root@localhost /]# cat fruit | sort | uniq -u / / used in combination with sort to filter out non duplicate lines orange peache pear [root@localhost /]# cat fruit | sort | uniq / / used in combination with sort to remove duplication apple banana cherry orange peache pear [root@localhost /]# cat fruit | sort -u / / you can also use sort -u directly apple banana cherry orange peache pear
4,tr
(1) It can replace one character with another, or it can completely remove some characters, or it can be used to remove duplicate characters.
(2) Syntax: tr [options]... SET1 [SET2]
Replace, reduce, and / or delete characters from standard input and write the results to standard output.
(3) Common options
-d Delete character -s Delete all duplicate characters and keep only the first one
[root@localhost /]# Cat fruit | tr 'Apple' and 'Apple' / / replacement is the replacement of one-to-one letters APPLE APPLE PEAchE PEAr bAnAnA chErry chErry bAnAnA orAngE [root@localhost /]# cat fruit | tr 'a' / / enclose the replaced characters in single quotation marks, including special characters pple pple pe che pe r b n n cherry cherry b n n or nge [root@localhost /]# cat fruit | tr 'a' '/' /pple /pple pe/che pe/r b/n/n/ cherry cherry b/n/n/ or/nge [root@localhost /]# cat fruit | tr 'ap' / '/ / replace multiple characters with one ///le ///le /e/che /e/r b/n/n/ cherry cherry b/n/n/ or/nge [root@localhost /]# Cat fruit | tr 'Apple', 'star' / / a replaced with s, p replaced with a, le replaced with r saarr saarr arschr arsr bsnsns chrrry chrrry bsnsns orsngr [root@localhost /]# cat fruit | tr "'" /' / / if you want to replace the single quotation mark, you need to enclose the single quotation mark with double quotation marks, and the backslash escape is not allowed apple apple peache pear banana cherry cherry banana orange [root@localhost /]# cat fruit | tr -d 'a' / / delete all a pple pple peche per bnn cherry cherry bnn ornge [root@localhost /]# cat fruit | tr -d 'apple' / / delete all that contain these five letters ch r bnn chrry chrry bnn orng [root@localhost /]# cat fruit | tr -d '\n' / / delete line breaks appleapplepeachepearbananacherrycherrybananaorange[root@localhost /]# [root@localhost /]# Cat fruit | tr - s' p '/ / de duplicate the P character and keep only the first one aple aple peache pear banana cherry cherry banana orange [root@localhost /]# Cat fruit | tr - s' \ n '/ / when multiple carriage returns are encountered, only one carriage return is reserved, which is equivalent to removing empty lines apple apple peache pear banana cherry cherry banana orange
summary
1. Regular expression, also known as normal expression and regular expression; It is used to retrieve and replace text that conforms to a certain pattern (rule). It is divided into basic regular expression (BRE) and extended regular expression (ERE).
2. grep command is a text search command. It can search text with regular expressions or use the contents of a file as search keywords.
3. sed is a text processing tool, which can read text content and delete, replace, add and move data according to specified conditions. It is widely used in Shell scripts.
4. awk is a powerful editing and installation tool for text and data processing under Linux.
5. Common file sorting tools: cut, sort, uniq, tr.