sort, uniq, tr, cut, eval and regular expressions

Posted by tsapat on Mon, 03 Jan 2022 02:01:28 +0100

1, sort command

1. The role of sort

	Sort the contents of files in behavioral units, or according to different data types

2. Syntax format

	sort	[option]  parameter
	cat file | sort option

3. Common options

Common options	effect
-f	Ignore case and convert all lowercase letters to uppercase letters for comparison
-b	Ignore spaces before each line
-n	Sort by number
-r	Reverse sort
-u	Equivalent to uniq, which means that only one row of the same data is displayed
-t	Specify the field separator, which is divided by using the [Tab] key by default
-k	Specify sort field
-o	< output file > transfer the sorted resu lt s to the specified file

4. Use examples

1,sort command
 First, compare the first non empty character of each line, according to the empty line>number>Letter (lowercase)>In uppercase). If the first character is the same, the second character that is not empty is compared, and so on.

[root@localhost ~]# cat 1.txt 
e
d
c
b
a
A
B
C
D
E
11
1
22
2
33
 44
55

[root@localhost ~]# sort 1.txt 

1
11
2
22
33
 44
55
a
A
b
B
c
C
d
D
e
E

2. - f option
Use the - f option so that uppercase letters take precedence over lowercase letters

[root@localhost ~]# sort -f 1.txt 

 44
1
11
2
22
33
55
A
a
B
b
C
c
D
d
E
e

3. - n option
Because the sort command compares in character order, numbers cannot be sorted effectively. When we need to sort numbers, we can use the - n option.

[root@localhost ~]# sort -n 1.txt 

a
A
b
B
c
C
d
D
e
E
1
2
11
22
33
 44
55

4. - r option
-r is reverse sort

[root@localhost ~]# sort -r 1.txt 
E
e
D
d
C
c
B
b
A
a
55
 44
33
22
2
11
1

5. - u option
-u displays duplicate lines as one line

[root@localhost ~]# cat 2.txt 
11
11
22
33
33
aa
aa
bb
c
c

[root@localhost ~]# sort -u 2.txt 

11
22
33
aa
bb
c

6. - t, - k options
-t option to specify the separator- The k option specifies the sort order.

Specify the third column with ':' as the separator, and sort / etc/passwd according to the number size

[root@localhost ~]# sort -t ':' -k3 -n /etc/passwd
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
sync:x:5:0:sync:/sbin:/bin/sync
shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
halt:x:7:0:halt:/sbin:/sbin/halt
mail:x:8:12:mail:/var/spool/mail:/sbin/nologin
operator:x:11:0:operator:/root:/sbin/nologin
games:x:12:100:games:/usr/games:/sbin/nologin
ftp:x:14:50:FTP User:/var/ftp:/sbin/nologin
rpcuser:x:29:29:RPC Service User:/var/lib/nfs:/sbin/nologin
rpc:x:32:32:Rpcbind Daemon:/var/lib/rpcbind:/sbin/nologin
ntp:x:38:38::/etc/ntp:/sbin/nologin
gdm:x:42:42::/var/lib/gdm:/sbin/nologin
tss:x:59:59:Account used by the trousers package to sandbox the tcsd daemon:/dev/null:/sbin/nologin
avahi:x:70:70:Avahi mDNS/DNS-SD Stack:/var/run/avahi-daemon:/sbin/nologin
tcpdump:x:72:72::/:/sbin/nologin
sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin
radvd:x:75:75:radvd user:/:/sbin/nologin
dbus:x:81:81:System message bus:/:/sbin/nologin
postfix:x:89:89::/var/spool/postfix:/sbin/nologin
nobody:x:99:99:Nobody:/:/sbin/nologin
qemu:x:107:107:qemu user:/:/sbin/nologin
usbmuxd:x:113:113:usbmuxd user:/:/sbin/nologin
pulse:x:171:171:PulseAudio System Daemon:/var/run/pulse:/sbin/nologin
rtkit:x:172:172:RealtimeKit:/proc:/sbin/nologin
abrt:x:173:173::/etc/abrt:/sbin/nologin
dhcpd:x:177:177:DHCP server:/:/sbin/nologin
systemd-network:x:192:192:systemd Network Management:/:/sbin/nologin
gnome-initial-setup:x:991:986::/run/gnome-initial-setup/:/sbin/nologin
sssd:x:992:987:User for sssd:/:/sbin/nologin

7. - o option
-o option to overwrite the sorting results and output them to the specified file
Count the file size in the VaR directory, sort according to the file size, and then overwrite the output to the var.txt file.

[root@localhost ~]# du -a /var | sort -nr -o var.txt
[root@localhost ~]# vim var.txt 

4953252 /var
4417332 /var/ftp/centos7
4417332 /var/ftp
3919212 /var/ftp/centos7/Packages
359940  /var/ftp/centos7/LiveOS
359936  /var/ftp/centos7/LiveOS/squashfs.img
353064  /var/cache
350440  /var/cache/yum/x86_64/7
350440  /var/cache/yum/x86_64
350440  /var/cache/yum
192592  /var/cache/yum/x86_64/7/updates
173508  /var/lib
153580  /var/cache/yum/x86_64/7/base
109892  /var/cache/yum/x86_64/7/updates/gen
103108  /var/lib/rpm
98576   /var/cache/yum/x86_64/7/base/gen
92036   /var/lib/rpm/Packages
85152   /var/ftp/centos7/Packages/firefox-52.2.0-2.el7.centos.x86_64.rpm
74936   /var/ftp/centos7/Packages/libreoffice-core-5.0.6.2-14.el7.x86_64.rpm
66636   /var/cache/yum/x86_64/7/updates/packages
63972   /var/ftp/centos7/Packages/texlive-cm-super-svn15878.0-38.el7.noarch.rpm
62036   /var/ftp/centos7/images
53428   /var/ftp/centos7/isolinux
53368   /var/cache/yum/x86_64/7/updates/packages/linux-firmware-20200421-80.git78c0348.el7_9.noarch.rpm
53072   /var/lib/tftpboot
53044   /var/ftp/centos7/images/pxeboot
52316   /var/cache/yum/x86_64/7/updates/gen/primary_db.sqlite
49820   /var/cache/yum/x86_64/7/updates/gen/filelists_db.sqlite
48056   /var/cache/yum/x86_64/7/base/gen/filelists_db.sqlite
47300   /var/lib/tftpboot/initrd.img
47300   /var/ftp/centos7/isolinux/initrd.img

2, uniq command

1. Role of uniq

Used to report or ignore consecutive duplicate lines in a file, often associated with sort Command

2. Syntax format

uniq [option] parameter
cat file | uniq option

3. Common options

Common options	explain
-c	Count and delete duplicate lines in the file
-d	Show only consecutive repeating lines
-u	Show rows that appear only once

Use example

1. uniq command
For continuous repetitive de duplication, discontinuous de duplication.

[root@localhost ~]# cat 2.txt 
11
11
22
33
33
11
aa
aa
bb
33
c
c

[root@localhost ~]# uniq 2.txt 
11
22
33
11
aa
bb
33
c

To remove all duplicates, you can sort first and then remove duplicates.

[root@localhost ~]# sort -n 2.txt | uniq
aa
bb
c
11
22
33

2. - c option
Count the number of repetitions and remove the duplicate

[root@localhost ~]# sort -n 2.txt | uniq -c
      2 aa
      1 bb
      2 c
      3 11
      1 22
      3 33

3. - d option
Use the - d option to display only consecutive repeating lines

[root@localhost ~]# uniq -d 2.txt 
11
33
aa
c
[root@localhost ~]# cat 2.txt 
11
11
22
33
33
11
aa
aa
bb
33
c
c

If you need to display all duplicate lines, you can use the "sort -n" command and then "uniq -d" operation.

[root@localhost ~]# cat 2.txt 
11
11
22
33
33
11
aa
bb
aa
bb
33
c
c
[root@localhost ~]# sort -n 2.txt | uniq -d
aa
bb
c
11
33

4. - u option

Use the - u option to display only rows that do not have consecutive repetitions (rows that do not have consecutive repetitions are not displayed)

[root@localhost ~]# uniq -u 2.txt 
22
11
aa
bb
aa
bb
33

Show rows without duplicates

[root@localhost ~]# sort -n 2.txt | uniq -u
22

3, tr command

1. The role of tr

It is often used to replace, compress and delete characters from standard input

2. Syntax format

	tr [option] [parameter]
	The parameter is the character set to be operated. The usage method is as follows:
	Character set 1: Specifies the original character set to be converted or deleted. When performing a conversion operation, you must specify the target character set for the conversion using the parameter "character set 2". However, the parameter "character set 2" is not required for deletion.
	Character set 2: Specifies the target character set to convert to.

3. Common options

Common options	explain
-c	Keep the characters of character set 1, and replace other characters (including newline characters) \ nwith character set 2
-d	Delete all characters belonging to character set 1
-s	Compress the repeated string into a string; Replace character set 1 with character set 2
-t	Character set 2 replaces character set 1. The result is the same without options

4. Use example

1. tr command

tr The command can replace the characters in character set 1 with the characters in character set 2, and there is a one-to-one correspondence, so the number of characters before and after must be the same.

[root@localhost ~]# echo 'ABC' | tr "A-Z" "a-z"
abc
[root@localhost ~]# echo 'aBb' | tr "A-Z" "a-z"
abb
[root@localhost ~]# echo 'ABC' | tr "AB" "CD"
CDC

2. - c option

Use the - c option to retain the characters in character set 1, and other characters other than character set 1 (including newline characters) will be replaced with characters in character set 2.

[root@localhost ~]# echo  -e "abc\nabc" | tr -c "b\n" "0"
0b0
0b0

3. - d option

Use the - d option to delete all characters belonging to character set 1.

[root@localhost ~]# echo  -e "abc\nabc" | tr -d "a"
bc
bc

4. - s option
You can compress the repeated string into one string, or you can use character set 2 to replace the content of character set 1 with the content of character set 2 for compression.

[root@localhost ~]# echo 'aaaaaabbbbc' | tr -s "ab"
abc
[root@localhost ~]# echo 'aaaaaabbbbc' | tr -s "ab" "0"
0c

"tr -s" \ n "command compresses empty lines

[root@localhost ~]# echo -e 'a\n\n\nb' | tr -s "\n"
a
b

5. Delete the "^ M" character caused by Windows files
If the line feed character "\ n" is encountered in Linux, the operation of carriage return + line feed will be carried out. Instead, the carriage return character will only be displayed as the control character "^ M", and carriage return will not occur. In Windows, the carriage return + line feed character "\ r\n" is required to correctly execute the carriage return + line feed operation. If a control character is missing or the order is wrong, it cannot start another line correctly.
Generally, we cannot detect whether there is "^ M" symbol, which can be viewed through the "cat -v" command.
The problem of "^ M" can also be modified by dos2unix software

[root@localhost ~]# cat abc.txt
aa

bb
cc[root@localhost ~]# 
[root@localhost ~]# cat -v abc.txt
aa^M
^M
bb^M
cc[root@localhost ~]#

Method 1:
Directly use the tr command to replace "\ r" with "". After conversion, there will be a space at the end of each line.

[root@localhost ~]# cat abc.txt | tr "\r" " " > ABC.txt
[root@localhost ~]# cat -v ABC.txt 
aa 
 
bb 
cc[root@localhost ~]#

Method 2:
Use the "tr -s" command to replace "\ r" with "", as above.
You can also replace "" with "\ n". Since consecutive "\ n" will be compressed into one "\ n", if there is an empty line, it will be wrapped only once, resulting in no empty line after conversion.

[root@localhost ~]# cat abc.txt | tr -s "\r" " " > ABC.txt
[root@localhost ~]# cat -v ABC.txt
aa

bb
cc[root@localhost ~]# 
[root@localhost ~]# cat abc.txt | tr -s "\r" "\n" > ABC.txt
[root@localhost ~]# cat -v ABC.txt
aa
bb
cc[root@localhost ~]#

6. Sort the array with a=(2 4 6 3 1 5) from small to large
Idea: output the array, replace the space with newline, and sort with sort

[root@localhost ~]# a=(2 4 6 3 1 5)
[root@localhost ~]# echo ${a[@]}
2 4 6 3 1 5
[root@localhost ~]# echo ${a[@]} | tr " " "\n" | sort -n
1
2
3
4
5
6

4, cut command

1. Function of cut

Displays the specified part of the line and deletes the specified field from the file

2. Syntax format

cut option	parameter
cat file | cut option

3. Common options

Common options	explain
-b	Split in bytes
-c	Split in characters
-f	By specifying which field to extract. The cut command uses Tab as the default field separator
-d	Tab is the default field separator and can be changed to a different separator using this option
–complement	This option is used to exclude the specified field
–output-delimiter	Change the separator of the output

4. Use example

1. - d -f option
Extract the first field with: as the separator

[root@localhost ~]# cut -d ':' -f 1 /etc/passwd
root
bin
daemon
adm
lp
sync
shutdown
halt
mail
operator
games
ftp
nobody
systemd-network
dbus
polkitd
abrt
libstoragemgmt
rpc
colord
saslauth
setroubleshoot
rtkit
pulse
qemu
ntp
radvd
chrony
tss
usbmuxd
geoclue
sssd
gdm
rpcuser
nfsnobody
gnome-initial-setup
avahi
postfix
sshd
tcpdump
qiao
dhcpd

Intercept the fields 1-3, 6 and 7 in the / etc/passwd file with the specified conditions

[root@localhost ~]# grep 'bin/bash' /etc/passwd
root:x:0:0:root:/root:/bin/bash
qiao:x:1000:1000:qiao:/home/qiao:/bin/bash
[root@localhost ~]# grep 'bin/bash' /etc/passwd | cut -d ':' -f 1-3,6,7
root:x:0:/root:/bin/bash
qiao:x:1000:/home/qiao:/bin/bash

2. – completion - f option
Exclude specified fields

[root@localhost ~]# grep 'bin/bash' /etc/passwd | cut -d ':' --complement -f 2
root:0:0:root:/root:/bin/bash
qiao:1000:1000:qiao:/home/qiao:/bin/bash

3. – output delimiter option

Change the delimiter of the specified field

[root@localhost ~]# grep 'bin/bash' /etc/passwd | cut -d ':'  -f 1,7 --output-delimiter=''
root/bin/bash
qiao/bin/bash

4. - b option
Intercepts the specified character in bytes

[root@localhost ~]# j=123456789
[root@localhost ~]# echo $j | cut -b 2-4
234
[root@localhost ~]# echo ${j:2:4}
3456
[root@localhost ~]# expr substr $j 3 4
3456

5, eval command

1. Role of eval

	Add before the command word eval When, shell It is scanned twice before the command is executed. eval The command will first scan the command line for all permutations, and then execute the command. This command is applicable to variables whose function cannot be realized in one scan. This command scans the variable twice.

2. Use example

[root@localhost ~]# echo 'cutomorry' > file
[root@localhost ~]# myfile="cat file"
[root@localhost ~]# echo $myfile 
cat file
[root@localhost ~]# eval $myfile
cutomorry

6, Regular expression

1. The difference between regular expressions and wildcards

	Regular expressions are usually used to match characters, while wildcards are usually used to match files

2. Composition of regular expressions

Regular expressions are composed of ordinary characters and metacharacters,
Ordinary characters include upper and lower case letters, numbers, punctuation marks and some other symbols
 Metacharacter is a special character with special meaning in regular expression. It can be used to specify the occurrence mode of its leading character (that is, the character or expression in front of metacharacter) in the target object

3. Common metacharacters in basic regular expressions

Common metacharacters	explain
\|Escape character, used to cancel the meaning of special symbols, for example: \n. $etc
^	The starting position of the matching string, for example: a, the, #, [a-z]
$	The end of the matching string, for example: wordKaTeX parse error: Expected group after '^' at position 2:^ ̲ Match blank lines
.	Match any character except \ n, for example: go d,g…d
*	Match the front sub expression 0 or more times, for example: good, go d
[list]	Match a character in the list, for example, go[ola]d, [abc], [A-Z], [a-z0-9] and [0-9] match any digit
[^list]	Match a character in any non list list, for example: [0-9], [A-Z0-9], [^ A-Z] match any non lowercase letter
\ {n\ }	Match the previous subexpression n times. For example, go{2}d and '[0 = 9] {2}' match two digits
\{n,\ }	Match the previous subexpression no less than n times, for example: go{2,}d and '[0-9] {2,}' match two or more digits
\{n,m\}	Match the previous subexpression n to m times. For example, go{2,3}d and '[0-9] {2,3}' match two to three digits

 Supported tools include: grep,egrep,sed,awk
 Note: egrep,awk use\{n\},\{n,\},\{n,m\}When matching“{}"Don't add it before“\"

4. Extended regular expression metacharacter

Common metacharacters	explain
+	Match the front sub expression more than 1 time, for example: go+d, will match at least one 0, such as god, good, good, etc
?	Match the front sub expression 0 or 1 times, for example: go?d. Will match gd or god
()	Take the string in parentheses as a whole, for example: g(oo)+d, which will match the whole more than once, such as good, good, etc
l	Match the string in or, for example: g(oo)

Supported tools include: egrep and awk

5. Application examples

Matching email address, requirements:

user name@，The character length is 6 digits or more, and the beginning can only be a letter or_,The symbols that can be used in the middle are.-#_
Subdomain names can be upper and lower case letters, numbers and symbols.-_
.Top level domain name. The string length is 2-5 between

[root@localhost ~]# vim mailadd.txt	
	zhangsan1234.@qq.com
	qiao_666@sina.com.cn
	luoxiang@163.com
	zhao liu@wo.cn
	sun@qi.com
[root@localhost ~]# egrep '^([a-zA-Z_][a-zA-Z0-9_#\-\.]{5,})@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5}$)' mailadd.txt
zhangsan1234.@qq.com
qiao_666@sina.com.cn
luoxiang@163.com

Topics: Linux

Programmer Think

sort, uniq, tr, cut, eval and regular expressions

1, sort command

1. The role of sort

2. Syntax format

3. Common options

4. Use examples

2, uniq command

1. Role of uniq

2. Syntax format

3. Common options

Use example

3, tr command

1. The role of tr

2. Syntax format

3. Common options

4. Use example

4, cut command

1. Function of cut

2. Syntax format

3. Common options

4. Use example

5, eval command

6, Regular expression

1. The difference between regular expressions and wildcards

2. Composition of regular expressions

3. Common metacharacters in basic regular expressions

4. Extended regular expression metacharacter

5. Application examples

Hot Topics