Linux text three swordsmen -- grep, sed, awk

Posted by aesir5 on Sun, 16 Jan 2022 01:16:41 +0100

Linux text three swordsmen – grep, sed, awk

The three functions are all text processing, but the emphasis is different, among which awk is the most powerful and complex. grep is more suitable for finding or matching text in a single flush, sed is more suitable for editing the matched text, and awk is more suitable for formatting text and processing text in a more complex format.

1,grep

1.1 what is grep

grep command in Linux system is a powerful text search tool. It can use regular expressions to search text and print the matching lines (the matching lines are marked in red). grep is a Global regular expression print, which represents the global regular expression version.
Grep can be used in shell scripts because grep describes the search status by returning a status value. If the template search is successful, it returns 0, if the search is unsuccessful, it returns 1, and if the searched file does not exist, it returns 2

1.2 using grep

Command format

grep [option] pattern file

Command parameters

  • -A < display rows >: displays the contents after the row in addition to the column conforming to the template style.
  • -B < number of display lines >: in addition to the line that conforms to the style, the content before the line is displayed.
  • -C < display lines >: displays the contents before and after the line except the line that conforms to the style.
  • -c: Count the number of matched rows
  • -e: Implement the logical or relationship between multiple options
  • -E: Extended regular expression
  • -Get match from: PATTERN FILE
  • -F: Equivalent to fgrep
  • -I -- ignore case # ignores the case difference of characters.
  • -n: Show matching line numbers
  • -o: Show only matching strings
  • -q: Silent mode, no information output
  • -s: No error message is displayed.
  • -v: Display rows that are not matched by pattern, which is equivalent to [^] reverse matching
  • -w: Match entire word

Basic regular table expression

Match character

  • . match any single character, not blank lines
  • [] matches any single character within the specified range
  • [^] negate
  • [: alnum:] or [0-9a-zA-Z]
  • [: alpha:] or [a-zA-Z]
  • [: upper:] or [A-Z]
  • [: lower:] or [a-z]
  • [: blank:] white space characters (spaces and tabs)
  • [: Space:] horizontal and vertical white space characters (wider range than [: blank:])
  • [: cntrl:] non printable control characters (backspace, delete, alarm...)
  • [: digit:] decimal number or [0-9]
  • [: xdigit:] hexadecimal digit
  • [: graph:] printable non white space characters
  • [: print:] printable characters
  • [: punct:] punctuation

Matching times

  • *****Match the previous characters any time, including 0 times. Greedy mode: match as long as possible
  • . * any character of any preceding length, excluding 0 times
  • ? Matches the character before it 0 or 1 times
  • +Match the character before it at least once
  • {n} Match the previous character n times
  • {m,n} matches the preceding character at least m times and at most N times
  • {, n} matches the preceding character at most N times
  • {n,} matches the preceding character at least N times

Location anchor: locate where it appears

  • ^Row head anchor for the leftmost side of the pattern
  • $end of line anchor for the rightmost side of the pattern
  • ^PATTERN $, for PATTERN matching the entire line
  • ^$blank line
  • ^[[:space:]].*$ Blank line
  • < or \ b initial anchor for the left side of the word pattern
  • >Or \ b suffix anchor; For the right side of word mode
  • <PATTERN>

Extended regular expression

(1) Character matching:

  • . any single character
  • [] characters in the specified range
  • [^] characters outside the specified range
  • Number of matches:
  • *: matches the preceding character any number of times
  • ? : 0 or 1 times
  • +: 1 or more times
  • {m} : match m times
  • {m,n}: at least m, at most N times

(2) Position anchoring:

  • ^: line beginning
  • $: end of line
  • < \ B: initial
  • >, \ B: ending
  • Group: ()
  • Backward References: \ 1, \ 2

2.sed

2.1 what is sed

sed is a stream editor that processes one line at a time. During processing, the currently processed lines are stored in a temporary buffer, called a pattern space, and then the contents of the buffer are processed with the sed command. After processing, the contents of the buffer are sent to the screen. Then read the downlink and execute the next cycle. If a special command such as'D 'is not used, the mode space will be emptied between two loops, but the reserved space will not be emptied. This is repeated until the end of the file. The contents of the file do not change unless you use redirection to store input or - i

Function: it is mainly used to automatically edit one or more files to simplify the repeated operation of files.

2.2 using sed

Command format

sed [options] '[Address delimitation] command' file(s)

Common options

  • -n: The mode space content is not output to the screen, that is, it is not automatically printed, and only the matched lines are printed
  • **-e: * * multi point editing. When processing each line, there can be multiple scripts
  • -f: Write the Script to the file. When executing sed, - f specifies the file path. If there are multiple scripts, write on a new line
  • -r: Support extended regular expressions
  • -i: Write the processing results directly to the file
  • -i.bak: back up a copy before writing the processing results to the file

Address delimitation

  • Do not give address: process the full text
  • Single address:
    • #: specified row
    • /Pattern /: each line that can be matched by the pattern here
  • Address range:
    • #,#
    • #,+#
    • /pat1/,/pat2/
    • #,/pat1/
  • ~: step
    • sed -n '1~2p' print only odd lines (1 ~ 2 from the first line, add 2 lines at a time)
    • sed -n '2~2p' print only even lines

Edit command

  • d: Delete rows that match the pattern space and immediately enable the next cycle

  • p: Print the current mode space content and append it to the default output

  • a: Append text after the specified line. Multi line append is supported using \ n

  • i: Insert text in front of the line, and support multi line append using \ n

  • c: The replacement line is single line or multiple lines of text, and supports \ n multi line append

  • w: Save pattern matching lines to the specified file

  • r: After reading the text of the specified file to the matching line in the pattern space

  • =: print line numbers for lines in pattern space

  • !: Inverse processing of matching rows in pattern space

  • s///

  • Plus g indicates global replacement within the line;

  • When replacing, you can add a command to realize case conversion

  • \l: Converts the next character to lowercase.

  • \50: Convert the replacement letter to lowercase until \ U or \ E appears.

  • \u: Convert the next character to uppercase.

  • \U: Convert the replacement letter to uppercase until \ L or \ E appears.

  • \E: Stop case conversion starting with \ L or \ U

sed usage demonstration

[root@along ~]# cat demo
aaa
bbbb
AABBCCDD
[root@along ~]# sed "/aaa/p" demo  #The matched lines will be printed again, and the unmatched lines will also be printed
aaa
aaa
bbbb
AABBCCDD
[root@along ~]# sed -n "/aaa/p" demo  #-n do not display rows that do not match
aaa
[root@along ~]# sed -e "s/a/A/" -e "s/b/B/" demo  #-e multipoint editing
Aaa
Bbbb
AABBCCDD
[root@along ~]# cat sedscript.txt
s/A/a/g
[root@along ~]# sed -f sedscript.txt demo  #-f use file processing
aaa
bbbb
aaBBCCDD
[root@along ~]# sed -i.bak "s/a/A/g" demo  #-i process files directly
[root@along ~]# cat demo
AAA
bbbb
AABBCCDD
[root@along ~]# cat demo.bak
aaa
bbbb
AABBCCDD

Address definition demonstration

[root@along ~]# cat demo
aaa
bbbb
AABBCCDD
[root@along ~]# sed -n "p" demo  #Print full text without specifying rows
aaa
bbbb
AABBCCDD
[root@along ~]# sed "2s/b/B/g" demo  #Replace B - > b in line 2
aaa
BBBB
AABBCCDD
[root@along ~]# sed -n "/aaa/p" demo
aaa
[root@along ~]# sed -n "1,2p" demo  #Print 1-2 lines
aaa
bbbb
[root@along ~]# sed -n "/aaa/,/DD/p" demo
aaa
bbbb
AABBCCDD
[root@along ~]# sed -n "2,/DD/p" demo
bbbb
AABBCCDD
[root@along ~]# sed "1~2s/[aA]/E/g" demo  #Replace a or a of odd rows with E
EEE
bbbb
EEBBCCDD

Edit command demo

[root@along ~]# cat demo
aaa
bbbb
AABBCCDD
[root@along ~]# sed "2d" demo  #Delete line 2
aaa
AABBCCDD
[root@along ~]# sed -n "2p" demo  #Print line 2
bbbb
[root@along ~]# sed "2a123" demo  #Add 123 after line 2
aaa
bbbb
123
AABBCCDD
[root@along ~]# sed "1i123" demo  #Add 123 Before line 1
123
aaa
bbbb
AABBCCDD
[root@along ~]# sed "3c123\n456" demo  #Replace line 3
aaa
bbbb
123
456
[root@along ~]# sed -n "3w/root/demo3" demo  #Save the contents of line 3 to the demo3 file
[root@along ~]# cat demo3
AABBCCDD
[root@along ~]# sed "1r/root/demo3" demo  #Read the contents of demo3 to line 1
aaa
AABBCCDD
bbbb
AABBCCDD
[root@along ~]# sed -n "=" demo  #=Print line number
1
2
3
[root@along ~]# sed -n '2!p' demo  #Print except line 2
aaa
AABBCCDD
[root@along ~]# sed 's@[a-z]@\u&@g' demo  #Replace lowercase letters with uppercase letters for the full text
AAA
BBBB
AABBCCDD

sed advanced editing commands

  • h: Overwrite the contents of the pattern space into the holding space
  • H: Append the contents of the mode space to the holding space
  • g: Take the data out of the holding space and overwrite it into the mode space
  • G: Take out the contents from the holding space and append them to the mode space
  • x: Interchange the content in the pattern space with the content in the hold space
  • n: Read the next row of the matched row and overwrite it into the pattern space
  • N: The next row of the read matched row is appended to the pattern space
  • d: Delete rows in schema space
  • D: Delete the contents from the beginning of the current mode space to \ n (no longer transmitted to standard output), abandon the subsequent commands, but re execute sed for the remaining mode space

① Case: output text content in reverse order

[root@along ~]# cat num.txt
One
Two
Three
[root@along ~]# sed '1!G;h;$!d' num.txt
Three
Two
One

1!G The first line is not executed G Command, starting from the second line

$!d The last line is not deleted

③ Summarize the relationship between mode space and maintain space:

The holding space is a buffer for temporarily storing data in the mode space to assist in data processing in the mode space

(3) Demonstration

① Show even rows

[root@along ~]# seq 9 |sed -n 'n;p'
2
4
6
8

② Reverse order display

[root@along ~]# seq 9 |sed  '1!G;h;$!d'
9
8
7
6
5
4
3
2
1

③ Show odd rows

[root@along ~]# seq 9 |sed 'H;n;d'
1
3
5
7
9

④ Show last line

[root@along ~]# seq 9| sed 'N;D'
9

⑤ Add a blank line between each line

[root@along ~]# seq 9 |sed 'G'
1
 
2
 
3
 
4
 
5
 
6
 
7
 
8
 
9
 
---

⑥ Replace each line with a blank line

[root@along ~]# seq 9 |sed "g"
 
 
 
 
 
 
 
 
 
---

⑦ Make sure there is a blank line below each line

[root@along ~]# seq 9 |sed '/^$/d;G'
1
 
2
 
3
 
4
 
5
 
6
 
7
 
8
 
9

3.awk

What is awk

awk is a programming language used to process text and data under Linux. Data can come from standard input, one or more files, and the output of other commands. It supports advanced functions such as user-defined functions and dynamic regular expressions. awk has many built-in functions, such as arrays, functions and so on.

Using awk

grammar

awk` `[options] ``'program'` `var=value ``file``...``awk` `[options] -f programfile var=value ``file``...``awk` `[options] ``'BEGIN{ action;... } pattern{ action;... } END{ action;... }'` `file` `...

Common command options

  • -F fs: fs specifies the input delimiter. fs can be a string or regular expression.
  • -v var=value: assign a user-defined variable and pass the external variable to awk
  • -F scriptfile: read the awk command from the script file
    awk variable
    Variables: built in and user-defined variables. Add the - v command option before each variable

(1) Format

  • Field separator: FS, the default is blank
  • OFS: output field separator; blank character by default
  • RS: enter the record separator, specify the line feed character when entering, and the original line feed character is still valid
  • ORS: output the record separator, and use the specified symbol instead of the newline character during output
  • NF: number of fields, total number of fields, N F lead use most after one column , NF refers to the last column, NF refers to the last column, (NF-1) refers to the penultimate column
  • NR: line number, which can be followed by multiple documents. The line number of the second document continues to start from the last line number of the first document
  • FNR: each document is counted separately, and the line number followed by a document is the same as NR. it is the same as multiple documents, and the line number of the second document starts from 1
  • FILENAME: current file name
  • ARGC: number of command line parameters
  • ARGV: array, which saves the parameters given by the command line. View the parameters
[root@along ~]# cat awkdemo
hello:world
linux:redhat:lalala:hahaha
along:love:youou
[root@along ~]# awk -v FS=':' '{print $1,$2}' awkdemo  #FS specifies the input separator
hello world
linux redhat
along love
[root@along ~]# awk -v FS=':' -v OFS='---' '{print $1,$2}' awkdemo  #OFS specifies the output separator
hello---world
linux---redhat
along---love
[root@along ~]# awk -v RS=':' '{print $1,$2}' awkdemo
hello
world linux
redhat
lalala
hahaha along
love
you
[root@along ~]# awk -v FS=':' -v ORS='---' '{print $1,$2}' awkdemo
hello world---linux redhat---along love---
[root@along ~]# awk -F: '{print NF}' awkdemo
2
4
3
[root@along ~]# awk -F: '{print $(NF-1)}' awkdemo  #Displays the penultimate column
hello
lalala
love
[root@along ~]# awk '{print NR}' awkdemo awkdemo1
1
2
3
4
5
[root@along ~]# awk END'{print NR}' awkdemo awkdemo1
5
[root@along ~]# awk '{print FNR}' awkdemo awkdemo1
1
2
3
1
2
[root@along ~]# awk '{print FILENAME}' awkdemo
awkdemo
awkdemo
awkdemo
[root@along ~]# awk 'BEGIN {print ARGC}' awkdemo awkdemo1
3
[root@along ~]# awk 'BEGIN {print ARGV[0]}' awkdemo awkdemo1
awk
[root@along ~]# awk 'BEGIN {print ARGV[1]}' awkdemo awkdemo1
awkdemo
[root@along ~]# awk 'BEGIN {print ARGV[2]}' awkdemo awkdemo1
awkdemo1

Custom variable

(1)-v var=value

① Define the variable first, and then execute the action print

[root@along ~]# awk -v name="along" -F: '{print name":"$0}' awkdemo
along:hello:world
along:linux:redhat:lalala:hahaha
along:along:love:you

② Variables are defined after the action print is executed

[root@along ~]# awk -F: '{print name":"$0;name="along"}' awkdemo
:hello:world
along:linux:redhat:lalala:hahaha
along:along:love:you

(2) Directly defined in program

You can put the executed actions in the script and directly call the script - f

[root@along ~]# cat awk.txt
{name="along";print name,$1}
[root@along ~]# awk -F: -f awk.txt awkdemo
along hello
along linux
along along

printf command

(1) Format output

printf` `"FORMAT"``, item1,item2, ...

① FORMAT must be specified

② No automatic line feed. You need to explicitly give the line feed control character, \ n

③ FORMAT needs to specify FORMAT characters for each subsequent item

(2) Format character: one-to-one correspondence with item

  • %c: Displays the ASCII code of the character

  • %d. % I: display decimal integers

  • %e. % e: display scientific count values

  • %f: Show

    Floating point number, decimal% 5.1f, with 5 digits of integer, decimal point and integer, and 1 decimal place. If it is not enough, fill in the space

  • %g. % G: display values in scientific counting or floating point form

  • %s: Display string; Example:% 5s at least 5 characters, not enough space, more than 5 characters will continue to be displayed

  • %u: Unsigned integer

  • %%: Show% itself

(3) Modifier: placed between% c[/d/e/f...]

  • #[. #]: the first digit controls the width of the display; The second # represents the precision after the decimal point,% 5.1f
  • -: align left (default align right)% - 15s
  • +% + d: displays the positive and negative sign of the value

Operator**

format

  • Arithmetic operators:

    • x+y, x-y, x*y, x/y, x^y, x%y
    • -x: Convert to negative
    • +x: Convert to numeric
  • String operator: unsigned operator, string concatenation

  • Assignment operator:

    • =, +=, -=, *=, /=, %=, ^=
    • ++, –
  • Comparison operator:

    • ==, !=, >, >=, <, <=
  • Pattern matching character: ~: whether the left matches the right, including! ~: Mismatch

  • Logical operators: and & &, or |, not!

  • Function call: function_name(argu1, argu2, …)

  • Conditional expression (ternary expression):

    selector

    ?

    if-true-expression

    :

    if-false-expression

    • Note: judge the selector first, and execute if it meets the requirements? Operation after; Otherwise, perform the following operation:

demonstration

(1) Pattern matching character

---Query to/dev Disk information at the beginning
[root@along ~]# df -h |awk -F: '$0 ~ /^\/dev/'
/dev/mapper/cl-root   17G  7.3G  9.7G  43% /
/dev/sda1           1014M  121M  894M  12% /boot
---Only disk usage and disk names are displayed
[root@along ~]# df -h |awk '$0 ~ /^\/dev/{print $(NF-1)"---"$1}'
43%---/dev/mapper/cl-root
12%---/dev/sda1
---Find disks larger than 40%of
[root@along ~]# df -h |awk '$0 ~ /^\/dev/{print $(NF-1)"---"$1}' |awk -F% '$1 > 40'

(2) Logical operator

[root@along ~]# awk -F: '$3>=0 && $3<=1000 {print $1,$3}' /etc/passwd
root 0
bin 1
[root@along ~]# awk -F: '$3==0 || $3>=1000 {print $1}' /etc/passwd
root
[root@along ~]# awk -F: '!($3==0) {print $1}' /etc/passwd
bin
[root@along ~]# awk -F: '!($0 ~ /bash$/) {print $1,$3}' /etc/passwd
bin 1
daemon 2

(3) Conditional expression (ternary expression)

[root@along ~]# awk -F: '{$3 >= 1000?usertype="common user":usertype="sysadmin user";print usertype,$1,$3}' /etc/passwd
sysadmin user root 0
common user along 1000

grep sed awk comparison

  • Grep: text filter. If you only filter text, you can use grep, which is much more efficient than others
  • sed: stream editor. By default, it only processes the mode space and does not process the original data
  • Awk: report generator, displayed after formatting. If you need to generate information such as reports for the processed data, or if the data you process is processed by column, it is best to use awk

Topics: Linux