Linux: File Management: awk; Awk command

Posted by juuuugroid on Sun, 16 Jan 2022 05:24:17 +0100

Process text files for text analysis

REFERENCE:

  • https://www.cnblogs.com/ginvip/p/6352157.html
  • https://www.runoob.com/linux/linux-comm-awk.html

The Syntax of awk

awk [Option parameters] 'script' var=value file(s)
or
awk [Option parameters] -f scriptfile var=value file(s)

Option parameters

  • -F fs | --fiel-separator fs
    Specify the input file separator. fs is a string or a regular expression, such as ` - F:.

  • -v var=value | --asign var=value
    Assign a user-defined variable.

    awk -F: -v i=5 '{ print $3,$(i-2) }' /etc/passwd
    0 0
    1 1
    2 2
    
  • -f scripfile | --file scriptfile
    Read the awk command from the script file.

    awk -f xxx.awk /etc/passwd
    
  • -mf nnn & -mr nnn
    Set the internal limit for the nnn value, and the - mf option limits the maximum number of blocks allocated to the nnn- The mr option limits the maximum number of records. These two functions are extensions of Bell lab awk and are not applicable in standard awk.

  • -W compact | --compat, -W traditional | --traditional
    Run awk in compatibility mode. Therefore, gawk behaves exactly like the standard awk, and all awk extensions are ignored.

  • -W copyleft | --copyleft, -W copyright | --copyright
    Print short copyright information.

  • -W help | --help, -W usage | --usage
    Print all awk options and a brief description of each option.

  • -W lint | --lint
    Print warnings about structures that cannot be ported to traditional unix platforms.

  • -W lint-old | --lint-old
    Print warnings about structures that cannot be ported to traditional unix platforms.

  • -W posix
    Turn on compatibility mode. However, there are the following limitations: unrecognized: / x, function keyword, func, escape sequence, and when fs is a space, the new line is used as a field separator; Operators * * and * * = cannot replace ^ and ^ =; Invalid fflush.

  • -W re-interval | --re-inerval
    Allow the use of interval regular expressions, refer to (Posix character class in grep), such as bracket expression [[: alpha:]].

  • -W source program-text | --source program-text
    Program text is used as the source code, which can be mixed with the - f command.

  • -W version | --version
    Print the version of the bug report information.

AWK principle

Just look at lines 20 to 30 in the passwd file

awk '{ if( NR>=20 && NR<=30 ){print $0} }' /etc/passwd
mysql:x:27:27:MySQL Server:/var/lib/mysql:/bin/false
nginx:x:998:996:nginx user:/var/cache/nginx:/sbin/nologin

Know the contents of passwd file, filter out the user name root and command parser / bin/bash, and finally output root /bin/bash

awk -F ':' '{ if( NR==1 )print $1" "$7 }' /etc/passwd
root /bin/bash

BEGIN/END module

Count the number of accounts in / etc/passwd

awk '{count++} END{print "[END] The number of users is ",count}' /etc/passwd
[END] The number of users is  21

count is a user-defined variable. count is not initialized here. Although it is 0 by default, the safest way is to initialize

awk 'BEGIN{count=0} {count++} END{print "[END] The number of users is ",count}' /etc/passwd
[END] The number of users is  21

AWK operator

DescriptionOperational Character
assignment= += -= *= /= %= ^= **=
logic|| &&
regular~Match regular expression~ Mismatch regular expression
relationship< > <= >= != ==
arithmetic+- * / & remainder; ^*** Exponentiation; + + –
otherWhether a key value exists In the In array$ Field reference

assignment

awk 'BEGIN{ a=5;a+=5;print a }'
10

logic

awk 'BEGIN{ a=0;print ( a>-1||a<0 , a>-1&&a<0 ) }'
1 0

regular

awk 'BEGIN{ str="192,168,10,222";if( str~10 ){print "true"} }'
true
echo | awk 'BEGIN{ str="192,168,10,222" } str~10 {print "true"}'
true

relationship

awk 'BEGIN{ a=0;print (a<0,a==0,a>0) }'
0 1 0

< > you can compare strings and numeric values.

awk 'BEGIN{ a="11";if(a>=9){print "true"} }' # No output, compare ASCII order
awk 'BEGIN{ a=11;if(a>=9){print "true"} }'
true

arithmetic

Operands are automatically converted to numeric values by arithmetic operators, and all non numeric values become 0

awk 'BEGIN{ a="b";b="2b";print a,b,a++,b++ }'
b 2b 0 2

Others: binocular operation

awk 'BEGIN{ a="b";print a=="b"?1:0 }'
1

AWK built in variables

VariateDescriptionDefault
$0Current record
$1~$nThe nth field of the current record
FSField SeparatorEnter field separatorSpace
RSRecord SeparatorEnter record separator\t
NFNumber Of FieldThe number of fields in the current record; Total number of columns
NRNumber Of RecordNumber of current records; Line number
OFSOutput Field SeparatorOutput field separatorSpace
ORSOutput Record SeparatorOutput record separator\t

FS field separator

Line feed

awk 'BEGIN{ FS="\t+" }{ print $0 }' xxx.md # One or more Tab delimiters

Space

awk -F [[:space:]+] '{ print $0 }' xxx.md # One or more spaces

Multiple separators

awk -F '[ :/]' 'BEGIN{ OFS="\t" }{ print $2,$3,$9 }' /etc/passwd
x	0	bin
x	1	sbin
x	2	sbin

RS record separator ⭐ ️

awk 'BEGIN{ RS="" }{ print $0 }' /etc/passwd
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
·················
mysql:x:27:27:MySQL Server:/var/lib/mysql:/bin/false
nginx:x:998:996:nginx user:/var/cache/nginx:/sbin/nologin
awk 'BEGIN{ RS="" }{ print $1 }' /etc/passwd
root:x:0:0:root:/root:/bin/bash
awk 'BEGIN{ RS="" }{ print $2 }' /etc/passwd
bin:x:1:1:bin:/bin:/sbin/nologin

Number of NF fields

awk -F "/" 'NF==5{print $0}' /etc/passwd # Print by / with 5 fields
adm:x:3:4:adm:/var/adm:/sbin/nologin
games:x:12:100:games:/usr/games:/sbin/nologin
ftp:x:14:50:FTP User:/var/ftp:/sbin/nologin
ntp:x:38:38::/etc/ntp:/sbin/nologin

NR record quantity

awk 'NR==1{print $0}' /etc/passwd # NR==1, take the first row of data
root:x:0:0:root:/root:/bin/bash

OFS output field separator

slightly

ORS output record separator

slightly

IGNORECASE ignores case

awk 'BEGIN{ IGNORECASE=1 } /user/' /etc/passwd
ftp:x:14:50:FTP User:/var/ftp:/sbin/nologin
polkitd:x:999:998:User for polkitd:/:/sbin/nologin
nginx:x:998:996:nginx user:/var/cache/nginx:/sbin/nologin

AWK regular expression

CharacterFunctionSamplesInterpretation
^Line beginning locator/^root/Match all lines starting with root
$End of line locator/root$/Match all lines ending with root
.Match 1 character/r..t/Match four strings with r as the head and t as the tail
*Match [0, + ∞) leading characters/roo*t/
+Match [1, + ∞) leading characters/ro+t/
?Match [0,1] leading characters/r?oot/
[]Match any character in []/^[abc]/Match lines starting with a, b, c
[^]Matches characters that are not in [^]/^[^ab]/Match lines that do not start with a, b
()Subexpression combination/(root)+/[1, + ∞) root combinations
|perhaps/(root)|(user)/Rows matching root or user
\Escape character/a\/Match a/
~matching$1~/root/Matches the line whose first field contains the character root
!~Mismatch$1!~/root/
x{m}x repeat m times/[rot]{4}/Matches a row of four consecutive characters all composed of rot
x{m,}x repeat m times or more
x{m,n}x repeat m~n times

Regular expression

awk '/REG/{ACTION}' FILE # /REG / is a regular expression, which can send the qualified records in $0 to ACTION for processing
awk '/root/{print $0}' /etc/passwd # Match rows containing root

Boolean expression

awk 'BOOLEAN{ACTION}' FILE # ACTION is executed by awk only when BOOLEAN value is TRUE
awk -F: '$1=="root"{print $0}' /etc/passwd
root:x:0:0:root:/root:/bin/bash

condition loop

if

if($1=="root"){
  print $0
}

while

do while

count=1
do{
  print $1
} while( count !=1 )

for

for( i=1;i<10;i++){
  print $1
}

array

Use awk to view the server connection status and summarize

netstat -an|awk '/^tcp/{++s[$NF]}END{for(a in s)print a,s[a]}'
ESTABLISHED 1
LISTEN 20

Statistics of web log access traffic, required output access times, requested pages or pictures, total size of each request, and summary of total access traffic

awk '{a[$7]+=$10;++b[$7];total+=$10}END{for(x in a)print b[x],x,a[x]|"sort -rn -k1";print "total size is :"total}' /app/log/access_log
total size is :172230
21 /icons/poweredby.png 83076
14 / 70546
8 /icons/apache_pb.gif 18608
a[$7]+=$10 Represents an array with column 7 as the subscript( $10 List as $7 The size of the columns) and add up their sizes $7 The size of each access, followed by for There's a trick in the loop, a and b The subscripts of the array are the same, so one for Sentence is enough

String function

FunctionDescription
gsub( Ere,Repl,[In] )
sub( Ere,Repl,[In] )
index( String1,String2 )
length[( String )]String length
blength[( String )]String length in bytes
substr( String,M,[N] )String interception
match( String,Ere )
split( String,A,[Ere] )
tolower( String )
toupper( String )
sprintf( Format,Expr,Expr,... )

gsub replacement

awk 'BEGIN{ str="abc123abc";gsub(/[0-9]+/,"!",str);print str }'
abc!abc

Find the substring satisfying the regular expression in str, and use! Replace and return the replaced value to str

index lookup

awk 'BEGIN{ str="abc123abc";print index(str,"abc")?"true":"false" }'
true # Non zero if found

Match match lookup

awk 'BEGIN{ str="abc123abc";print match(str,/[0-9]+/) }'
4

substr interception

awk 'BEGIN{ str="abc123abc";print substr(str,4,6) }'
123abc

Exercise

Format output

awk -F: '{printf "%-8s %-10s\n",$1,$6 }' /etc/passwd
root     /root     
bin      /bin      
daemon   /sbin     

Operator: filter rows with the third column less than 3

awk -F: '$3<3' /etc/passwd
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin

Calculate file size

ls -l | awk '{ sum+=$5 } END{ print sum }'
1535

Find lines longer than 60 from the file

awk 'length>60' /etc/passwd
systemd-network:x:192:192:systemd Network Management:/:/sbin/nologin
sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin

Print 99 multiplication table

seq 9 | sed 'H;g' | awk -v RS='' '{ for(i=1;i<=NF;i++)printf("%dx%d=%d%s",i,NR,i*NR,i==NR?"\n":"\t") }'
1x1=1
1x2=2	2x2=4
1x3=3	2x3=6	3x3=9
1x4=4	2x4=8	3x4=12	4x4=16
1x5=5	2x5=10	3x5=15	4x5=20	5x5=25
1x6=6	2x6=12	3x6=18	4x6=24	5x6=30	6x6=36
1x7=7	2x7=14	3x7=21	4x7=28	5x7=35	6x7=42	7x7=49
1x8=8	2x8=16	3x8=24	4x8=32	5x8=40	6x8=48	7x8=56	8x8=64
1x9=9	2x9=18	3x9=27	4x9=36	5x9=45	6x9=54	7x9=63	8x9=72	9x9=81

Topics: Linux