Linux learning notes for basic skills training of big data development engineers

Posted by olaf on Mon, 24 Jan 2022 17:55:38 +0100

Pipeline related commands

target

cut
sort
wc
uniq
tee
tr
split
awk
sed
grep

preparation

zhangsan 68 99 26
lisi 98 66 96
wangwu 38 33 86
zhaoliu 78 44 36
maq 88 22 66
zhouba 98 44 46

The above is the score sheet information
Separated by commas, the first column is the name, the second column is the Chinese score, the third column is the math score, and the fourth column is the English score

preparation

vim 1.txt

111:aaa:bbb:ccc
222:ddd:eee:fff
333:ggg:hhh
444:iii

1 cut

1.1 objectives

cut extracts the corresponding content from the command result according to the condition

1.2 realization

Step 1: intercept 1 The fifth character in the first two lines of the txt file

command	meaning
cut action file	Intercept content from specified file

parameter

parameter	english	meaning
-c	characters	Select content by character

head -2 1.txt | cut -c 5

Step 2: intercept 1 Txt file, the first two lines are divided by ":"

parameter	english	meaning
-d 'separator'	delimiter	Specify separator
-f n1,n2	fields	After segmentation, display the content of the paragraph. Use, split

Scope control

Range	meaning
n	Show only item n
n-	Display from item n to the end of the line
n-m	Display items from n to m (including m)

head -2 1.txt | cut -d ':' -f 1,2

head -2 1.txt | cut -d ':' -f 1-2

1.3 summary

Through the cut action, the target file can extract the corresponding content according to the conditions

preparation

vim score.txt

zhangsan 68 99 26
lisi 98 66 96
wangwu 38 33 86
zhaoliu 78 44 36
maq 88 22 66
zhouba 98 44 46

2 sort

2.1 objectives

sort sorts the contents of a text file in behavioral units.

2.2 route

Step 1: sort strings
Step 2: de reorder
Step 3: sort values
Step 4: sort the results

2.3 realization

Step 1: sort strings

[root@node01 tmp]# cat 2.txt
banana
apple
pear
orange
pear

[root@node01 tmp]# sort 2.txt 
apple
banana
orange
pear
pear

Step 2: de reorder

parameter	english	meaning
-u	unique	Remove duplicate

Its function is very simple, that is, to remove duplicate lines in the output line.

[root@node01 tmp]# sort -u 2.txt 
apple
banana
orange
pear

Step 3: sort values

parameter	english	meaning
-n	numeric-sort	Sort by value size
-r	reverse	Reverse order

Prepare data

[root@node01 tmp]# cat 3.txt 
1
3
5
7
11
2
4
6
10
8
9

Sort by string by default

[root@node01 tmp]# sort 2.txt 
1
10
11
2
3
4
5
6
7
8
9

Ascending order

[root@node01 tmp]# sort -n 2.txt
1
2
3
4
5
6
7
8
9
10
11

Reverse order

[root@node01 tmp]# sort -n -r 2.txt
11
10
9
8
7
6
5
4
3
2
1

Combined type

[root@node01 tmp]# sort -nr 2.txt  
11
10
9
8
7
6
5
4
3
2
1

Step 4: sort the results

parameter	english	meaning
-t	field-separator	Specify field separator
-k	key	Sort by which column

# Display all contents in reverse order according to the results of the second paragraph
sort -t ',' -k2nr score.txt

3 wc command

3.1 objectives

Displays the number of bytes, words and lines of the specified file

3.2 route

Step 1: display the number of bytes, words and lines of the specified file
Step 2: display only the number of lines of the file
Step 3: count the number of lines, words and sections of multiple files
Step 4: check the number of sub contents in the / etc directory

3.3 realization

Step 1: display the number of bytes, words and lines of the specified file

command	meaning
wc file name	Displays the number of bytes, words and lines of the specified file

[root@hadoop01 export]# cat 4.txt
111
222 bbb
333 aaa bbb 
444 aaa bbb ccc
555 aaa bbb ccc ddd
666 aaa bbb ccc ddd eee

[root@hadoop01 export]# wc 4.txt 
 6 21 85 4.txt

Step 2: display only the number of lines of the file

parameter	english	meaning
-c	bytes	Number of bytes
-w	words	Number of words
-l	lines	Number of rows

[root@hadoop01 export]# wc 4.txt 
 6 21 85 3.txt

Step 3: count the number of lines, words and sections of multiple files

[root@hadoop01 export]# wc 1.txt 2.txt 3.txt 
  4   4  52 1.txt
 11  11  24 2.txt
  6  21  85 3.txt
 21  36 161 Total dosage
 
[root@hadoop01 export]# wc *.txt
  4   4  52 1.txt
 11  11  24 2.txt
  6  21  85 3.txt
  6   6  95 score.txt
 27  42 256 Total dosage

Step 4: check the number of sub contents in the / etc directory

[root@hadoop01 export]# ls /etc | wc -w
240

3.4 summary

Through wc file, you can count the number of bytes, words and lines of the file

4 uniq

The uniq command is used to check and delete repeated lines in text files. It is generally used in combination with the sort command.

4.1 objectives

The uniq command is used to check and delete repeated lines in text files. It is generally used in combination with the sort command.

4.2 path

Step 1: achieve weight removal effect
Step 2: not only remove the duplicate, but also count the number of occurrences

4.3 realization

Step 1: achieve de duplication effect

command	english	meaning
uniq [parameter] file	Unique unique	Remove duplicate lines

# Preparation content
[root@hadoop01 export]# cat 5.txt 
Zhang San    98
 Li Si    100
 Wang Wu    90
 Zhao Liu    95
 Ma Qi    70
 Li Si    100
 Wang Wu    90
 Zhao Liu    95
 Ma Qi    70

# sort
[root@hadoop01 export]# cat 5.txt | sort
 Li Si    100
 Li Si    100
 Ma Qi    70
 Ma Qi    70
 Wang Wu    90
 Wang Wu    90
 Zhang San    98
 Zhao Liu    95
 Zhao Liu    95

# duplicate removal
[root@hadoop01 export]# cat 5.txt | sort | uniq
 Li Si    100
 Ma Qi    70
 Wang Wu    90
 Zhang San    98
 Zhao Liu    95

Step 2: not only remove the duplicate, but also count the number of occurrences

parameter	english	meaning
-c	count	Count the number of occurrences of each line

[root@hadoop01 export]# cat 5.txt | sort | uniq -c
      2 Li Si    100
      2 Ma Qi    70
      2 Wang Wu    90
      1 Zhang San    98
      2 Zhao Liu    95

4.4 summary

Through the uniq [options] file, you can complete the de duplication and count the number of times

5 tee

5.1 objectives

tee allows you to pipe command results to multiple files

5.2 realization

command	meaning
Command result \| tee file 1 file 2 file 3	tee allows you to pipe command results to multiple files

Put the results of de duplication statistics into a.txt, b.txt and c.txt files
```
cat 5.txt | sort | uniq -c | tee a.txt b.txt c.txt
```

5.3 summary

tee allows you to pipe command results to multiple files

6 tr

6.1 objectives

The tr command is used to replace or delete characters in files.

6.2 path

Step 1: achieve replacement effect
Step 2: achieve deletion effect
Step 3: complete the word count case

6.3 realization

Step 1: achieve replacement effect

command	english	meaning
Command result \| tr replaced character new character	translate	Achieve replacement effect

# Replace lowercase I with uppercase I
# Convert itheima's to uppercase
# Convert HELLO to lowercase

# Replace lowercase I with uppercase I
echo "itheima" | tr 'i' 'I'

# Convert itheima's to uppercase
echo "itheima" |tr '[a-z]' '[A-Z]'

# Convert HELLO to lowercase
echo "HELLO" |tr '[A-Z]' '[a-z]'

Step 2: achieve deletion effect

command	english	meaning
Command result \| tr -d deleted characters	delete	Deletes the specified character

Requirement: delete abc1d4e5f the number in

echo 'abc1d4e5f' | tr -d '[0-9]'

Step 3: word count

preparation

[root@hadoop01 export]# cat words.txt 
hello,world,hadoop
hive,sqoop,flume,hello
kitty,tom,jerry,world
hadoop

1 replace with a new line

2 sorting

3 weight removal

4 count

# Count the number of occurrences of each word
[root@hadoop01 export]# cat words.txt | tr ',' '\n' | sort | uniq -c
      1 flume
      2 hadoop
      2 hello
      1 hive
      1 jerry
      1 kitty
      1 sqoop
      1 tom
      2 world

preparation

# Check the / etc directory to Contents of files ending with conf
cat -n /etc/*.conf

# Append the command results to the / export/v.txt file
cat -n /etc/*.conf >> /export/v.txt

7 split

7.1 objectives

Divide a large file into several small files through the split command

7.2 route

Step 1: divide the large file into several small files by bytes
Step 2: divide the large file into several small files according to the number of lines

7.3 realization

Step 1: divide the large file into several small files by bytes

command	english	meaning
split -b 10k file	byte	The large file is divided into several small files of 10KB

Step 2: divide the large file into several small files according to the number of lines

command	english	meaning
split -l 1000 file	lines	The large file is divided into several small files with 1000 lines

7.4 summary

Use the split option file name command to split a large file into several small files

Preparation 1:

vim score.txt

zhangsan 68 99 26
lisi 98 66 96
wangwu 38 33 86
zhaoliu 78 44 36
maq 88 22 66
zhouba 98 44 46

8 awk

8.1 objectives

Through awk, we can realize fuzzy query, extract fields on demand, judge and simple operation

8.2 steps

Step 1: fuzzy query
Step 2: specify the separator to display the content according to the subscript
Step 3: specify the delimiter of the output field
Step 4: call the function provided by awk
Step 5: judge whether $4 passes through the if statement
Step 6: sum the contents of paragraphs

8.3 realization

Step 1: search the scores of zhangsan and lisi

command	meaning
awk '/zhangsan\|lisi/' score.txt	Fuzzy query

Step 2: specify the separator to display the content according to the subscript

command	meaning
awk -F ',' '{print $1, $2, $3}' 1.txt	Operation 1 Txt file, separated by commas, print the contents of the first paragraph, the second paragraph and the third paragraph

option

option	english	meaning
-F ','	field-separator	Use specified character segmentation
$+ number		Get what paragraph
$0		Get the content of the current line
NF	field	Indicates how many fields there are in the current row
$NF		Represents the last field
$(NF-1)		Represents the penultimate field
NR		Which line does the delegate handle

Step 3: specify the separator to display the content according to the subscript

command	meaning
awk -F ' ' '{OFS="==="}{print $1, $2, $3}' 1.txt	Operation 1 Txt file, separated by commas, print the contents of the first paragraph, the second paragraph and the third paragraph

option

option	english	meaning
OFS = "character"	output field separator	Segment split string when outputting out

Step 4: call the function provided by awk

command	meaning
awk -F ',' '{print toupper($2)}' 1.txt	Operation 1 Txt file, separated by commas, print the contents of the first paragraph, the second paragraph and the third paragraph

Common functions are as follows:

Function name	meaning	effect
toupper()	upper	Convert characters to uppercase
tolower()	lower	Convert characters to lowercase
length()	length	Return character length

Step 5: query the information of qualified students with if statement

command	meaning
awk -F ',' '{if($4>60) print $1, $4 }' score.txt	If you pass, it shows $1, $4
awk -F ',' {if ($4 > 60) print $1, $4, "pass"; else print $1, $4, "fail"} 'score txt	Display name, $4, pass or not

option

parameter	meaning
if($0 ~ "aa") print $0	If this line contains "aa", print this line
if($1 ~ "aa") print $0	If the * * first paragraph * * contains "aa", print this line
if($1 == "lisi") print $0	If the first paragraph equals "lisi", print this line

Step 6: calculate the average score of the subject according to the content of the paragraph

command	meaning
awk 'BEGIN {initialization operation} {execute} END {operation at END} for each line' file name	BEGIN {here is the statement before execution} {here are the statements to be executed when processing each line} END {this is the statement to be executed after all rows are processed}

awk -F ',' 'BEGIN{}{total=total+$4}END{print total, NR, (total/NR)}' score.txt

preparation

vim 1.txt

aaa java root
bbb hello
ccc rt
ddd root nologin
eee rtt
fff ROOT nologin
ggg rttt

9 sed

9.1 objectives

Filtering and replacement can be realized through sed

9.2 route

Step 1: realize the query function
Step 2: implement the deletion function
Step 3: realize the modification function
Step 4: realize the replacement function
Step 5: operate the original file
Step 6: Comprehensive Practice

9.3 realization

Step 1: realize the query function

command	meaning
sed optional target file	Filter, query or replace the target file

Optional parameters

Optional	english	meaning
p	print	Print
$		Represents the last line
-n		Show only processed results
-e	expression	Process according to expression

Exercise 1 List 1 Txt

sed -n -e '1,5p' 1.txt

Exercise 2 list 01 Txt

sed -n -e '1,$p' 1.txt

Exercise 3 list 01 Txt and displays the line number

Optional	meaning
=	Print current line number

sed -n -e '1,$=' -e '1,$p' 1.txt 

Simplified version
cat -n 1.txt
cat -b 1.txt
nl 1.txt

Exercise 4: find 01 Txt contains the root line

answer:

sed -n -e '/root/p' 1.txt

Exercise 5: List 01 Txt contains the content of root. Root is not case sensitive and displays the line number

Optional	english	meaning
I	ignore	ignore case

answer:

nl 1.txt | sed -n -e '/root/Ip'

nl 01.txt | grep -i root

cat -n 01.txt | grep -i root

Exercise 6 find out 1 Txt is followed by multiple lines of t, and the line number is displayed

Optional	english	meaning
-r	regexp-extended	Recognition regularity

answer:

nl 01.txt | sed -nr -e '/r+t/p'

perhaps

sed -nr -e '/r+t/p' -e '/r+t/=' 01.txt

Step 2: implement the deletion function

Exercise 1: delete 01 Txt and display the line number

Optional	english	meaning
d	delete	Delete specified content

answer:

nl 01.txt | sed -e '1,3d'

Exercise 2 keep 1 Txt and display the line number

answer:

nl 01.txt | sed -e '5,$d'

nl 1.txt | sed -n -e '1,4p'

Step 3: realize the modification function

Exercise 1: on 01 Add AAA after the second line of txt and display the line number

parameter	english	meaning
i	insert	Insert content before target
a	append	Add content after target

answer:

nl 01.txt | sed -e '2a aaaaa'

Exercise 2 ， in 1 Add bbbbb before the first line of txt and display the line number

answer:

nl 01.txt | sed -e '1i bbbbb'

Step 4: realize the replacement function

Exercise 1 Replace nologin in txt with huawei and display the line number

	english	meaning
s/oldString/newString/	replace	replace

answer:

nl 1.txt | sed -e 's/nologin/huawei/'

Exercise 2: 01 Txt, and display the line number

option	english
2c new string	replace	Replace the selected line with a new string

answer:

nl passwd | sed -e '1,2c aaa'

Step 5: operate the original file

Exercise 1 ， at 01 Txt, replace nologin with huawei

parameter	english	meaning
-i	in-place	Replace the original file content

answer:

sed -i -e 's/nologin/huawei/' 01.txt

Exercise 2 ， at 01 Lines 2 and 3 in TXT file are replaced with AAAA

answer:

sed -i -e '2,3c aaa' 01.txt

Note: before operation, it is best to back up the data. If the operation is wrong, the data cannot be recovered!

Exercise 3: delete 01 Txt, and delete the data in the original file

answer:

sed -i -e '1,2d' 01.txt


nl passwd View data

Step 6: Comprehensive Practice

Exercise 1: get the ip address

answer:

ifconfig eth0 | grep "inet addr" | sed -e 's/^.*inet addr://' | sed -e 's/Bcast:.*$//'

Exercise 2: from 1 Txt, match the content containing root, and then replace nologin with itheima

answer:

nl 01.txt | grep 'root' | sed -e 's/nologin/itheima/'

perhaps

nl 01.txt | sed -n -e '/root/p' | sed -e 's/nologin/itheima/'

perhaps

nl 01.txt | sed -n -e '/root/{s/nologin/itheima/p}' #Show only rows of replacement content

Exercise 3: from 1 Txt, delete the first two lines, replace nologin with itheima, and display the line number

answer:

nl 01.txt | sed -e '1,2d' | sed -e 's/nologin/itheima/'

Programmer Think

Linux learning notes for basic skills training of big data development engineers

Pipeline related commands

target

1 cut

1.1 objectives

1.2 realization

Step 1: intercept 1 The fifth character in the first two lines of the txt file

Step 2: intercept 1 Txt file, the first two lines are divided by ":"

1.3 summary

2 sort

2.1 objectives

2.2 route

2.3 realization

Step 1: sort strings

Step 2: de reorder

Step 3: sort values

Step 4: sort the results

3 wc command

3.1 objectives

3.2 route

3.3 realization

Step 1: display the number of bytes, words and lines of the specified file

Step 2: display only the number of lines of the file

Step 3: count the number of lines, words and sections of multiple files

Step 4: check the number of sub contents in the / etc directory

3.4 summary

4 uniq

4.1 objectives

The uniq command is used to check and delete repeated lines in text files. It is generally used in combination with the sort command.

4.2 path

4.3 realization

Step 1: achieve de duplication effect

Step 2: not only remove the duplicate, but also count the number of occurrences

4.4 summary

5 tee

5.1 objectives

5.2 realization

Put the results of de duplication statistics into a.txt, b.txt and c.txt files

5.3 summary

6 tr

6.1 objectives

6.2 path

6.3 realization

Step 1: achieve replacement effect

Step 2: achieve deletion effect

Step 3: word count

preparation

7 split

7.1 objectives

7.2 route

7.3 realization

Step 1: divide the large file into several small files by bytes

Step 2: divide the large file into several small files according to the number of lines

7.4 summary

8 awk

8.1 objectives

8.2 steps

8.3 realization

Step 1: search the scores of zhangsan and lisi

Step 2: specify the separator to display the content according to the subscript

Step 3: specify the separator to display the content according to the subscript

Step 4: call the function provided by awk

Step 5: query the information of qualified students with if statement

Step 6: calculate the average score of the subject according to the content of the paragraph

9 sed

9.1 objectives

9.2 route

9.3 realization

Step 1: realize the query function

Exercise 1 List 1 Txt

Exercise 2 list 01 Txt

Exercise 3 list 01 Txt and displays the line number

Exercise 4: find 01 Txt contains the root line

Exercise 5: List 01 Txt contains the content of root. Root is not case sensitive and displays the line number

Exercise 6 find out 1 Txt is followed by multiple lines of t, and the line number is displayed

Step 2: implement the deletion function

Exercise 1: delete 01 Txt and display the line number

Exercise 2 keep 1 Txt and display the line number

Step 3: realize the modification function