Linux text three swordsmen – grep, sed, awk
The three functions are all text processing, but the emphasis is different, among which awk is the most powerful and complex. grep is more suitable for finding or matching text in a single flush, sed is more suitable for editing the matched text, and awk is more suitable for formatting text and processing text in a more complex format.
1,grep
1.1 what is grep
grep command in Linux system is a powerful text search tool. It can use regular expressions to search text and print the matching lines (the matching lines are marked in red). grep is a Global regular expression print, which represents the global regular expression version.
Grep can be used in shell scripts because grep describes the search status by returning a status value. If the template search is successful, it returns 0, if the search is unsuccessful, it returns 1, and if the searched file does not exist, it returns 2
1.2 using grep
Command format
grep [option] pattern file
Command parameters
- -A < display rows >: displays the contents after the row in addition to the column conforming to the template style.
- -B < number of display lines >: in addition to the line that conforms to the style, the content before the line is displayed.
- -C < display lines >: displays the contents before and after the line except the line that conforms to the style.
- -c: Count the number of matched rows
- -e: Implement the logical or relationship between multiple options
- -E: Extended regular expression
- -Get match from: PATTERN FILE
- -F: Equivalent to fgrep
- -I -- ignore case # ignores the case difference of characters.
- -n: Show matching line numbers
- -o: Show only matching strings
- -q: Silent mode, no information output
- -s: No error message is displayed.
- -v: Display rows that are not matched by pattern, which is equivalent to [^] reverse matching
- -w: Match entire word
Basic regular table expression
Match character
- . match any single character, not blank lines
- [] matches any single character within the specified range
- [^] negate
- [: alnum:] or [0-9a-zA-Z]
- [: alpha:] or [a-zA-Z]
- [: upper:] or [A-Z]
- [: lower:] or [a-z]
- [: blank:] white space characters (spaces and tabs)
- [: Space:] horizontal and vertical white space characters (wider range than [: blank:])
- [: cntrl:] non printable control characters (backspace, delete, alarm...)
- [: digit:] decimal number or [0-9]
- [: xdigit:] hexadecimal digit
- [: graph:] printable non white space characters
- [: print:] printable characters
- [: punct:] punctuation
Matching times
- *****Match the previous characters any time, including 0 times. Greedy mode: match as long as possible
- . * any character of any preceding length, excluding 0 times
- ? Matches the character before it 0 or 1 times
- +Match the character before it at least once
- {n} Match the previous character n times
- {m,n} matches the preceding character at least m times and at most N times
- {, n} matches the preceding character at most N times
- {n,} matches the preceding character at least N times
Location anchor: locate where it appears
- ^Row head anchor for the leftmost side of the pattern
- $end of line anchor for the rightmost side of the pattern
- ^PATTERN $, for PATTERN matching the entire line
- ^$blank line
- ^[[:space:]].*$ Blank line
- < or \ b initial anchor for the left side of the word pattern
- >Or \ b suffix anchor; For the right side of word mode
- <PATTERN>
Extended regular expression
(1) Character matching:
- . any single character
- [] characters in the specified range
- [^] characters outside the specified range
- Number of matches:
- *: matches the preceding character any number of times
- ? : 0 or 1 times
- +: 1 or more times
- {m} : match m times
- {m,n}: at least m, at most N times
(2) Position anchoring:
- ^: line beginning
- $: end of line
- < \ B: initial
- >, \ B: ending
- Group: ()
- Backward References: \ 1, \ 2
2.sed
2.1 what is sed
sed is a stream editor that processes one line at a time. During processing, the currently processed lines are stored in a temporary buffer, called a pattern space, and then the contents of the buffer are processed with the sed command. After processing, the contents of the buffer are sent to the screen. Then read the downlink and execute the next cycle. If a special command such as'D 'is not used, the mode space will be emptied between two loops, but the reserved space will not be emptied. This is repeated until the end of the file. The contents of the file do not change unless you use redirection to store input or - i
Function: it is mainly used to automatically edit one or more files to simplify the repeated operation of files.
2.2 using sed
Command format
sed [options] '[Address delimitation] command' file(s)
Common options
- -n: The mode space content is not output to the screen, that is, it is not automatically printed, and only the matched lines are printed
- **-e: * * multi point editing. When processing each line, there can be multiple scripts
- -f: Write the Script to the file. When executing sed, - f specifies the file path. If there are multiple scripts, write on a new line
- -r: Support extended regular expressions
- -i: Write the processing results directly to the file
- -i.bak: back up a copy before writing the processing results to the file
Address delimitation
- Do not give address: process the full text
- Single address:
- #: specified row
- /Pattern /: each line that can be matched by the pattern here
- Address range:
- #,#
- #,+#
- /pat1/,/pat2/
- #,/pat1/
- ~: step
- sed -n '1~2p' print only odd lines (1 ~ 2 from the first line, add 2 lines at a time)
- sed -n '2~2p' print only even lines
Edit command
-
d: Delete rows that match the pattern space and immediately enable the next cycle
-
p: Print the current mode space content and append it to the default output
-
a: Append text after the specified line. Multi line append is supported using \ n
-
i: Insert text in front of the line, and support multi line append using \ n
-
c: The replacement line is single line or multiple lines of text, and supports \ n multi line append
-
w: Save pattern matching lines to the specified file
-
r: After reading the text of the specified file to the matching line in the pattern space
-
=: print line numbers for lines in pattern space
-
!: Inverse processing of matching rows in pattern space
-
s///
-
Plus g indicates global replacement within the line;
-
When replacing, you can add a command to realize case conversion
-
\l: Converts the next character to lowercase.
-
\50: Convert the replacement letter to lowercase until \ U or \ E appears.
-
\u: Convert the next character to uppercase.
-
\U: Convert the replacement letter to uppercase until \ L or \ E appears.
-
\E: Stop case conversion starting with \ L or \ U
sed usage demonstration
[root@along ~]# cat demo aaa bbbb AABBCCDD [root@along ~]# sed "/aaa/p" demo #The matched lines will be printed again, and the unmatched lines will also be printed aaa aaa bbbb AABBCCDD [root@along ~]# sed -n "/aaa/p" demo #-n do not display rows that do not match aaa [root@along ~]# sed -e "s/a/A/" -e "s/b/B/" demo #-e multipoint editing Aaa Bbbb AABBCCDD [root@along ~]# cat sedscript.txt s/A/a/g [root@along ~]# sed -f sedscript.txt demo #-f use file processing aaa bbbb aaBBCCDD [root@along ~]# sed -i.bak "s/a/A/g" demo #-i process files directly [root@along ~]# cat demo AAA bbbb AABBCCDD [root@along ~]# cat demo.bak aaa bbbb AABBCCDD
Address definition demonstration
[root@along ~]# cat demo aaa bbbb AABBCCDD [root@along ~]# sed -n "p" demo #Print full text without specifying rows aaa bbbb AABBCCDD [root@along ~]# sed "2s/b/B/g" demo #Replace B - > b in line 2 aaa BBBB AABBCCDD [root@along ~]# sed -n "/aaa/p" demo aaa [root@along ~]# sed -n "1,2p" demo #Print 1-2 lines aaa bbbb [root@along ~]# sed -n "/aaa/,/DD/p" demo aaa bbbb AABBCCDD [root@along ~]# sed -n "2,/DD/p" demo bbbb AABBCCDD [root@along ~]# sed "1~2s/[aA]/E/g" demo #Replace a or a of odd rows with E EEE bbbb EEBBCCDD
Edit command demo
[root@along ~]# cat demo aaa bbbb AABBCCDD [root@along ~]# sed "2d" demo #Delete line 2 aaa AABBCCDD [root@along ~]# sed -n "2p" demo #Print line 2 bbbb [root@along ~]# sed "2a123" demo #Add 123 after line 2 aaa bbbb 123 AABBCCDD [root@along ~]# sed "1i123" demo #Add 123 Before line 1 123 aaa bbbb AABBCCDD [root@along ~]# sed "3c123\n456" demo #Replace line 3 aaa bbbb 123 456 [root@along ~]# sed -n "3w/root/demo3" demo #Save the contents of line 3 to the demo3 file [root@along ~]# cat demo3 AABBCCDD [root@along ~]# sed "1r/root/demo3" demo #Read the contents of demo3 to line 1 aaa AABBCCDD bbbb AABBCCDD [root@along ~]# sed -n "=" demo #=Print line number 1 2 3 [root@along ~]# sed -n '2!p' demo #Print except line 2 aaa AABBCCDD [root@along ~]# sed 's@[a-z]@\u&@g' demo #Replace lowercase letters with uppercase letters for the full text AAA BBBB AABBCCDD
sed advanced editing commands
- h: Overwrite the contents of the pattern space into the holding space
- H: Append the contents of the mode space to the holding space
- g: Take the data out of the holding space and overwrite it into the mode space
- G: Take out the contents from the holding space and append them to the mode space
- x: Interchange the content in the pattern space with the content in the hold space
- n: Read the next row of the matched row and overwrite it into the pattern space
- N: The next row of the read matched row is appended to the pattern space
- d: Delete rows in schema space
- D: Delete the contents from the beginning of the current mode space to \ n (no longer transmitted to standard output), abandon the subsequent commands, but re execute sed for the remaining mode space
① Case: output text content in reverse order
[root@along ~]# cat num.txt One Two Three [root@along ~]# sed '1!G;h;$!d' num.txt Three Two One 1!G The first line is not executed G Command, starting from the second line $!d The last line is not deleted
③ Summarize the relationship between mode space and maintain space:
The holding space is a buffer for temporarily storing data in the mode space to assist in data processing in the mode space
(3) Demonstration
① Show even rows
[root@along ~]# seq 9 |sed -n 'n;p' 2 4 6 8
② Reverse order display
[root@along ~]# seq 9 |sed '1!G;h;$!d' 9 8 7 6 5 4 3 2 1
③ Show odd rows
[root@along ~]# seq 9 |sed 'H;n;d' 1 3 5 7 9
④ Show last line
[root@along ~]# seq 9| sed 'N;D' 9
⑤ Add a blank line between each line
[root@along ~]# seq 9 |sed 'G' 1 2 3 4 5 6 7 8 9 ---
⑥ Replace each line with a blank line
[root@along ~]# seq 9 |sed "g" ---
⑦ Make sure there is a blank line below each line
[root@along ~]# seq 9 |sed '/^$/d;G' 1 2 3 4 5 6 7 8 9
3.awk
What is awk
awk is a programming language used to process text and data under Linux. Data can come from standard input, one or more files, and the output of other commands. It supports advanced functions such as user-defined functions and dynamic regular expressions. awk has many built-in functions, such as arrays, functions and so on.
Using awk
grammar
awk` `[options] ``'program'` `var=value ``file``...``awk` `[options] -f programfile var=value ``file``...``awk` `[options] ``'BEGIN{ action;... } pattern{ action;... } END{ action;... }'` `file` `...
Common command options
- -F fs: fs specifies the input delimiter. fs can be a string or regular expression.
- -v var=value: assign a user-defined variable and pass the external variable to awk
- -F scriptfile: read the awk command from the script file
awk variable
Variables: built in and user-defined variables. Add the - v command option before each variable
(1) Format
- Field separator: FS, the default is blank
- OFS: output field separator; blank character by default
- RS: enter the record separator, specify the line feed character when entering, and the original line feed character is still valid
- ORS: output the record separator, and use the specified symbol instead of the newline character during output
- NF: number of fields, total number of fields, N F lead use most after one column , NF refers to the last column, NF refers to the last column, (NF-1) refers to the penultimate column
- NR: line number, which can be followed by multiple documents. The line number of the second document continues to start from the last line number of the first document
- FNR: each document is counted separately, and the line number followed by a document is the same as NR. it is the same as multiple documents, and the line number of the second document starts from 1
- FILENAME: current file name
- ARGC: number of command line parameters
- ARGV: array, which saves the parameters given by the command line. View the parameters
[root@along ~]# cat awkdemo hello:world linux:redhat:lalala:hahaha along:love:youou [root@along ~]# awk -v FS=':' '{print $1,$2}' awkdemo #FS specifies the input separator hello world linux redhat along love [root@along ~]# awk -v FS=':' -v OFS='---' '{print $1,$2}' awkdemo #OFS specifies the output separator hello---world linux---redhat along---love [root@along ~]# awk -v RS=':' '{print $1,$2}' awkdemo hello world linux redhat lalala hahaha along love you [root@along ~]# awk -v FS=':' -v ORS='---' '{print $1,$2}' awkdemo hello world---linux redhat---along love--- [root@along ~]# awk -F: '{print NF}' awkdemo 2 4 3 [root@along ~]# awk -F: '{print $(NF-1)}' awkdemo #Displays the penultimate column hello lalala love [root@along ~]# awk '{print NR}' awkdemo awkdemo1 1 2 3 4 5 [root@along ~]# awk END'{print NR}' awkdemo awkdemo1 5 [root@along ~]# awk '{print FNR}' awkdemo awkdemo1 1 2 3 1 2 [root@along ~]# awk '{print FILENAME}' awkdemo awkdemo awkdemo awkdemo [root@along ~]# awk 'BEGIN {print ARGC}' awkdemo awkdemo1 3 [root@along ~]# awk 'BEGIN {print ARGV[0]}' awkdemo awkdemo1 awk [root@along ~]# awk 'BEGIN {print ARGV[1]}' awkdemo awkdemo1 awkdemo [root@along ~]# awk 'BEGIN {print ARGV[2]}' awkdemo awkdemo1 awkdemo1
Custom variable
(1)-v var=value
① Define the variable first, and then execute the action print
[root@along ~]# awk -v name="along" -F: '{print name":"$0}' awkdemo along:hello:world along:linux:redhat:lalala:hahaha along:along:love:you
② Variables are defined after the action print is executed
[root@along ~]# awk -F: '{print name":"$0;name="along"}' awkdemo :hello:world along:linux:redhat:lalala:hahaha along:along:love:you
(2) Directly defined in program
You can put the executed actions in the script and directly call the script - f
[root@along ~]# cat awk.txt {name="along";print name,$1} [root@along ~]# awk -F: -f awk.txt awkdemo along hello along linux along along
printf command
(1) Format output
printf` `"FORMAT"``, item1,item2, ...
① FORMAT must be specified
② No automatic line feed. You need to explicitly give the line feed control character, \ n
③ FORMAT needs to specify FORMAT characters for each subsequent item
(2) Format character: one-to-one correspondence with item
-
%c: Displays the ASCII code of the character
-
%d. % I: display decimal integers
-
%e. % e: display scientific count values
-
%f: Show
Floating point number, decimal% 5.1f, with 5 digits of integer, decimal point and integer, and 1 decimal place. If it is not enough, fill in the space
-
%g. % G: display values in scientific counting or floating point form
-
%s: Display string; Example:% 5s at least 5 characters, not enough space, more than 5 characters will continue to be displayed
-
%u: Unsigned integer
-
%%: Show% itself
(3) Modifier: placed between% c[/d/e/f...]
- #[. #]: the first digit controls the width of the display; The second # represents the precision after the decimal point,% 5.1f
- -: align left (default align right)% - 15s
- +% + d: displays the positive and negative sign of the value
Operator**
format
-
Arithmetic operators:
- x+y, x-y, x*y, x/y, x^y, x%y
- -x: Convert to negative
- +x: Convert to numeric
-
String operator: unsigned operator, string concatenation
-
Assignment operator:
- =, +=, -=, *=, /=, %=, ^=
- ++, –
-
Comparison operator:
- ==, !=, >, >=, <, <=
-
Pattern matching character: ~: whether the left matches the right, including! ~: Mismatch
-
Logical operators: and & &, or |, not!
-
Function call: function_name(argu1, argu2, …)
-
Conditional expression (ternary expression):
selector
?
if-true-expression
:
if-false-expression
- Note: judge the selector first, and execute if it meets the requirements? Operation after; Otherwise, perform the following operation:
demonstration
(1) Pattern matching character
---Query to/dev Disk information at the beginning [root@along ~]# df -h |awk -F: '$0 ~ /^\/dev/' /dev/mapper/cl-root 17G 7.3G 9.7G 43% / /dev/sda1 1014M 121M 894M 12% /boot ---Only disk usage and disk names are displayed [root@along ~]# df -h |awk '$0 ~ /^\/dev/{print $(NF-1)"---"$1}' 43%---/dev/mapper/cl-root 12%---/dev/sda1 ---Find disks larger than 40%of [root@along ~]# df -h |awk '$0 ~ /^\/dev/{print $(NF-1)"---"$1}' |awk -F% '$1 > 40'
(2) Logical operator
[root@along ~]# awk -F: '$3>=0 && $3<=1000 {print $1,$3}' /etc/passwd root 0 bin 1 [root@along ~]# awk -F: '$3==0 || $3>=1000 {print $1}' /etc/passwd root [root@along ~]# awk -F: '!($3==0) {print $1}' /etc/passwd bin [root@along ~]# awk -F: '!($0 ~ /bash$/) {print $1,$3}' /etc/passwd bin 1 daemon 2
(3) Conditional expression (ternary expression)
[root@along ~]# awk -F: '{$3 >= 1000?usertype="common user":usertype="sysadmin user";print usertype,$1,$3}' /etc/passwd sysadmin user root 0 common user along 1000
grep sed awk comparison
- Grep: text filter. If you only filter text, you can use grep, which is much more efficient than others
- sed: stream editor. By default, it only processes the mode space and does not process the original data
- Awk: report generator, displayed after formatting. If you need to generate information such as reports for the processed data, or if the data you process is processed by column, it is best to use awk