Linux learning notes for basic skills training of big data development engineers

Posted by olaf on Mon, 24 Jan 2022 17:55:38 +0100

Pipeline related commands

target

  • cut
  • sort
  • wc
  • uniq
  • tee
  • tr
  • split
  • awk
  • sed
  • grep
  • preparation

    zhangsan 68 99 26
    lisi 98 66 96
    wangwu 38 33 86
    zhaoliu 78 44 36
    maq 88 22 66
    zhouba 98 44 46
    
  • The above is the score sheet information

  • Separated by commas, the first column is the name, the second column is the Chinese score, the third column is the math score, and the fourth column is the English score

preparation

vim 1.txt

111:aaa:bbb:ccc
222:ddd:eee:fff
333:ggg:hhh
444:iii

1 cut

1.1 objectives

  • cut extracts the corresponding content from the command result according to the condition

1.2 realization

Step 1: intercept 1 The fifth character in the first two lines of the txt file

commandmeaning
cut action fileIntercept content from specified file
  • parameter
parameterenglishmeaning
-ccharactersSelect content by character
head -2 1.txt | cut -c 5

Step 2: intercept 1 Txt file, the first two lines are divided by ":"

parameterenglishmeaning
-d 'separator'delimiterSpecify separator
-f n1,n2fieldsAfter segmentation, display the content of the paragraph. Use, split

Scope control

Rangemeaning
nShow only item n
n-Display from item n to the end of the line
n-mDisplay items from n to m (including m)
head -2 1.txt | cut -d ':' -f 1,2
head -2 1.txt | cut -d ':' -f 1-2

1.3 summary

  • Through the cut action, the target file can extract the corresponding content according to the conditions
  • preparation

    vim score.txt

    zhangsan 68 99 26
    lisi 98 66 96
    wangwu 38 33 86
    zhaoliu 78 44 36
    maq 88 22 66
    zhouba 98 44 46
    

2 sort

2.1 objectives

  • sort sorts the contents of a text file in behavioral units.

2.2 route

  • Step 1: sort strings
  • Step 2: de reorder
  • Step 3: sort values
  • Step 4: sort the results

2.3 realization

Step 1: sort strings

[root@node01 tmp]# cat 2.txt
banana
apple
pear
orange
pear

[root@node01 tmp]# sort 2.txt 
apple
banana
orange
pear
pear

Step 2: de reorder

parameterenglishmeaning
-uuniqueRemove duplicate

Its function is very simple, that is, to remove duplicate lines in the output line.

[root@node01 tmp]# sort -u 2.txt 
apple
banana
orange
pear

Step 3: sort values

parameterenglishmeaning
-nnumeric-sortSort by value size
-rreverseReverse order
  • Prepare data

    [root@node01 tmp]# cat 3.txt 
    1
    3
    5
    7
    11
    2
    4
    6
    10
    8
    9
    
  • Sort by string by default

    [root@node01 tmp]# sort 2.txt 
    1
    10
    11
    2
    3
    4
    5
    6
    7
    8
    9
    
  • Ascending order

    [root@node01 tmp]# sort -n 2.txt
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    
  • Reverse order

    [root@node01 tmp]# sort -n -r 2.txt
    11
    10
    9
    8
    7
    6
    5
    4
    3
    2
    1
    
  • Combined type

    [root@node01 tmp]# sort -nr 2.txt  
    11
    10
    9
    8
    7
    6
    5
    4
    3
    2
    1
    

Step 4: sort the results

parameterenglishmeaning
-tfield-separatorSpecify field separator
-kkeySort by which column

''

# Display all contents in reverse order according to the results of the second paragraph
sort -t ',' -k2nr score.txt 

3 wc command

3.1 objectives

  • Displays the number of bytes, words and lines of the specified file

3.2 route

  • Step 1: display the number of bytes, words and lines of the specified file
  • Step 2: display only the number of lines of the file
  • Step 3: count the number of lines, words and sections of multiple files
  • Step 4: check the number of sub contents in the / etc directory

3.3 realization

Step 1: display the number of bytes, words and lines of the specified file

commandmeaning
wc file nameDisplays the number of bytes, words and lines of the specified file
[root@hadoop01 export]# cat 4.txt
111
222 bbb
333 aaa bbb 
444 aaa bbb ccc
555 aaa bbb ccc ddd
666 aaa bbb ccc ddd eee

[root@hadoop01 export]# wc 4.txt 
 6 21 85 4.txt

Step 2: display only the number of lines of the file

parameterenglishmeaning
-cbytesNumber of bytes
-wwordsNumber of words
-llinesNumber of rows
[root@hadoop01 export]# wc 4.txt 
 6 21 85 3.txt

Step 3: count the number of lines, words and sections of multiple files

[root@hadoop01 export]# wc 1.txt 2.txt 3.txt 
  4   4  52 1.txt
 11  11  24 2.txt
  6  21  85 3.txt
 21  36 161 Total dosage
 
[root@hadoop01 export]# wc *.txt
  4   4  52 1.txt
 11  11  24 2.txt
  6  21  85 3.txt
  6   6  95 score.txt
 27  42 256 Total dosage

Step 4: check the number of sub contents in the / etc directory

[root@hadoop01 export]# ls /etc | wc -w
240

3.4 summary

  • Through wc file, you can count the number of bytes, words and lines of the file

4 uniq

The uniq command is used to check and delete repeated lines in text files. It is generally used in combination with the sort command.

4.1 objectives

  • The uniq command is used to check and delete repeated lines in text files. It is generally used in combination with the sort command.

4.2 path

  • Step 1: achieve weight removal effect
  • Step 2: not only remove the duplicate, but also count the number of occurrences

4.3 realization

Step 1: achieve de duplication effect

commandenglishmeaning
uniq [parameter] fileUnique uniqueRemove duplicate lines
# Preparation content
[root@hadoop01 export]# cat 5.txt 
Zhang San    98
 Li Si    100
 Wang Wu    90
 Zhao Liu    95
 Ma Qi    70
 Li Si    100
 Wang Wu    90
 Zhao Liu    95
 Ma Qi    70

# sort
[root@hadoop01 export]# cat 5.txt | sort
 Li Si    100
 Li Si    100
 Ma Qi    70
 Ma Qi    70
 Wang Wu    90
 Wang Wu    90
 Zhang San    98
 Zhao Liu    95
 Zhao Liu    95

# duplicate removal
[root@hadoop01 export]# cat 5.txt | sort | uniq
 Li Si    100
 Ma Qi    70
 Wang Wu    90
 Zhang San    98
 Zhao Liu    95

Step 2: not only remove the duplicate, but also count the number of occurrences

parameterenglishmeaning
-ccountCount the number of occurrences of each line
[root@hadoop01 export]# cat 5.txt | sort | uniq -c
      2 Li Si    100
      2 Ma Qi    70
      2 Wang Wu    90
      1 Zhang San    98
      2 Zhao Liu    95

4.4 summary

  • Through the uniq [options] file, you can complete the de duplication and count the number of times

5 tee

5.1 objectives

  • tee allows you to pipe command results to multiple files

5.2 realization

commandmeaning
Command result | tee file 1 file 2 file 3tee allows you to pipe command results to multiple files
  • Put the results of de duplication statistics into a.txt, b.txt and c.txt files

    cat 5.txt | sort | uniq -c | tee a.txt b.txt c.txt
    

5.3 summary

  • tee allows you to pipe command results to multiple files

6 tr

6.1 objectives

  • The tr command is used to replace or delete characters in files.

6.2 path

  • Step 1: achieve replacement effect
  • Step 2: achieve deletion effect
  • Step 3: complete the word count case

6.3 realization

Step 1: achieve replacement effect

commandenglishmeaning
Command result | tr replaced character new charactertranslateAchieve replacement effect
# Replace lowercase I with uppercase I
# Convert itheima's to uppercase
# Convert HELLO to lowercase
# Replace lowercase I with uppercase I
echo "itheima" | tr 'i' 'I'

# Convert itheima's to uppercase
echo "itheima" |tr '[a-z]' '[A-Z]'

# Convert HELLO to lowercase
echo "HELLO" |tr '[A-Z]' '[a-z]'

Step 2: achieve deletion effect

commandenglishmeaning
Command result | tr -d deleted charactersdeleteDeletes the specified character
  • Requirement: delete abc1d4e5f the number in
echo 'abc1d4e5f' | tr -d '[0-9]'

Step 3: word count

preparation
[root@hadoop01 export]# cat words.txt 
hello,world,hadoop
hive,sqoop,flume,hello
kitty,tom,jerry,world
hadoop

1 replace with a new line

2 sorting

3 weight removal

4 count

# Count the number of occurrences of each word
[root@hadoop01 export]# cat words.txt | tr ',' '\n' | sort | uniq -c
      1 flume
      2 hadoop
      2 hello
      1 hive
      1 jerry
      1 kitty
      1 sqoop
      1 tom
      2 world
  • preparation

    # Check the / etc directory to Contents of files ending with conf
    cat -n /etc/*.conf
    
    # Append the command results to the / export/v.txt file
    cat -n /etc/*.conf >> /export/v.txt
    

7 split

7.1 objectives

  • Divide a large file into several small files through the split command

7.2 route

  • Step 1: divide the large file into several small files by bytes
  • Step 2: divide the large file into several small files according to the number of lines

7.3 realization

Step 1: divide the large file into several small files by bytes

commandenglishmeaning
split -b 10k filebyteThe large file is divided into several small files of 10KB

Step 2: divide the large file into several small files according to the number of lines

commandenglishmeaning
split -l 1000 filelinesThe large file is divided into several small files with 1000 lines

7.4 summary

  • Use the split option file name command to split a large file into several small files

  • Preparation 1:

    vim score.txt

    zhangsan 68 99 26
    lisi 98 66 96
    wangwu 38 33 86
    zhaoliu 78 44 36
    maq 88 22 66
    zhouba 98 44 46
    

8 awk

8.1 objectives

  • Through awk, we can realize fuzzy query, extract fields on demand, judge and simple operation

8.2 steps

  • Step 1: fuzzy query

  • Step 2: specify the separator to display the content according to the subscript

  • Step 3: specify the delimiter of the output field

  • Step 4: call the function provided by awk

  • Step 5: judge whether $4 passes through the if statement

  • Step 6: sum the contents of paragraphs

8.3 realization

Step 1: search the scores of zhangsan and lisi

commandmeaning
awk '/zhangsan|lisi/' score.txtFuzzy query

Step 2: specify the separator to display the content according to the subscript

commandmeaning
awk -F ',' '{print $1, $2, $3}' 1.txtOperation 1 Txt file, separated by commas, print the contents of the first paragraph, the second paragraph and the third paragraph

option

optionenglishmeaning
-F ','field-separatorUse specified character segmentation
$+ numberGet what paragraph
$0Get the content of the current line
NFfieldIndicates how many fields there are in the current row
$NFRepresents the last field
$(NF-1)Represents the penultimate field
NRWhich line does the delegate handle

Step 3: specify the separator to display the content according to the subscript

commandmeaning
awk -F ' ' '{OFS="==="}{print $1, $2, $3}' 1.txtOperation 1 Txt file, separated by commas, print the contents of the first paragraph, the second paragraph and the third paragraph

option

optionenglishmeaning
OFS = "character"output field separatorSegment split string when outputting out

Step 4: call the function provided by awk

commandmeaning
awk -F ',' '{print toupper($2)}' 1.txtOperation 1 Txt file, separated by commas, print the contents of the first paragraph, the second paragraph and the third paragraph

Common functions are as follows:

Function namemeaningeffect
toupper()upperConvert characters to uppercase
tolower()lowerConvert characters to lowercase
length()lengthReturn character length

Step 5: query the information of qualified students with if statement

commandmeaning
awk -F ',' '{if($4>60) print $1, $4 }' score.txtIf you pass, it shows $1, $4
awk -F ',' {if ($4 > 60) print $1, $4, "pass"; else print $1, $4, "fail"} 'score txtDisplay name, $4, pass or not

option

parametermeaning
if($0 ~ "aa") print $0If this line contains "aa", print this line
if($1 ~ "aa") print $0If the * * first paragraph * * contains "aa", print this line
if($1 == "lisi") print $0If the first paragraph equals "lisi", print this line

Step 6: calculate the average score of the subject according to the content of the paragraph

commandmeaning
awk 'BEGIN {initialization operation} {execute} END {operation at END} for each line' file nameBEGIN {here is the statement before execution}
{here are the statements to be executed when processing each line}
END {this is the statement to be executed after all rows are processed}
awk -F ',' 'BEGIN{}{total=total+$4}END{print total, NR, (total/NR)}' score.txt
  • preparation

    vim 1.txt

    aaa java root
    bbb hello
    ccc rt
    ddd root nologin
    eee rtt
    fff ROOT nologin
    ggg rttt
    

9 sed

9.1 objectives

  • Filtering and replacement can be realized through sed

9.2 route

  • Step 1: realize the query function
  • Step 2: implement the deletion function
  • Step 3: realize the modification function
  • Step 4: realize the replacement function
  • Step 5: operate the original file
  • Step 6: Comprehensive Practice

9.3 realization

Step 1: realize the query function

commandmeaning
sed optional target fileFilter, query or replace the target file

Optional parameters

Optionalenglishmeaning
pprintPrint
$Represents the last line
-nShow only processed results
-eexpressionProcess according to expression
  • Exercise 1 List 1 Txt

sed -n -e '1,5p' 1.txt 
  • Exercise 2 list 01 Txt

sed -n -e '1,$p' 1.txt 
  • Exercise 3 list 01 Txt and displays the line number

Optionalmeaning
=Print current line number
sed -n -e '1,$=' -e '1,$p' 1.txt 

Simplified version
cat -n 1.txt
cat -b 1.txt
nl 1.txt
  • Exercise 4: find 01 Txt contains the root line

answer:

sed -n -e '/root/p' 1.txt
  • Exercise 5: List 01 Txt contains the content of root. Root is not case sensitive and displays the line number

Optionalenglishmeaning
Iignoreignore case

answer:

nl 1.txt | sed -n -e '/root/Ip'

nl 01.txt | grep -i root

cat -n 01.txt | grep -i root
  • Exercise 6 find out 1 Txt is followed by multiple lines of t, and the line number is displayed

Optionalenglishmeaning
-rregexp-extendedRecognition regularity

answer:

nl 01.txt | sed -nr -e '/r+t/p'

perhaps

sed -nr -e '/r+t/p' -e '/r+t/=' 01.txt

Step 2: implement the deletion function

  • Exercise 1: delete 01 Txt and display the line number

Optionalenglishmeaning
ddeleteDelete specified content

answer:

nl 01.txt | sed -e '1,3d'
  • Exercise 2 keep 1 Txt and display the line number

answer:

nl 01.txt | sed -e '5,$d'

nl 1.txt | sed -n -e '1,4p'

Step 3: realize the modification function

  • Exercise 1: on 01 Add AAA after the second line of txt and display the line number

parameterenglishmeaning
iinsertInsert content before target
aappendAdd content after target

answer:

nl 01.txt | sed -e '2a aaaaa'
  • Exercise 2 , in 1 Add bbbbb before the first line of txt and display the line number

answer:

nl 01.txt | sed -e '1i bbbbb'

Step 4: realize the replacement function

  • Exercise 1 Replace nologin in txt with huawei and display the line number

englishmeaning
s/oldString/newString/replacereplace

answer:

nl 1.txt | sed -e 's/nologin/huawei/'
  • Exercise 2: 01 Txt, and display the line number

optionenglish
2c new stringreplaceReplace the selected line with a new string

answer:

nl passwd | sed -e '1,2c aaa'

Step 5: operate the original file

  • Exercise 1 , at 01 Txt, replace nologin with huawei

parameterenglishmeaning
-iin-placeReplace the original file content

answer:

sed -i -e 's/nologin/huawei/' 01.txt
  • Exercise 2 , at 01 Lines 2 and 3 in TXT file are replaced with AAAA

answer:

sed -i -e '2,3c aaa' 01.txt

Note: before operation, it is best to back up the data. If the operation is wrong, the data cannot be recovered!

  • Exercise 3: delete 01 Txt, and delete the data in the original file

answer:

sed -i -e '1,2d' 01.txt


nl passwd View data

Step 6: Comprehensive Practice

  • Exercise 1: get the ip address

answer:

ifconfig eth0 | grep "inet addr" | sed -e 's/^.*inet addr://' | sed -e 's/Bcast:.*$//' 
  • Exercise 2: from 1 Txt, match the content containing root, and then replace nologin with itheima

answer:

nl 01.txt | grep 'root' | sed -e 's/nologin/itheima/'

perhaps

nl 01.txt | sed -n -e '/root/p' | sed -e 's/nologin/itheima/'

perhaps

nl 01.txt | sed -n -e '/root/{s/nologin/itheima/p}' #Show only rows of replacement content
  • Exercise 3: from 1 Txt, delete the first two lines, replace nologin with itheima, and display the line number

answer:

nl 01.txt | sed -e '1,2d' | sed -e 's/nologin/itheima/'