day 19 regular expression

Posted by khalidorama on Sun, 02 Jan 2022 11:39:58 +0100

day 19 regular expression

1, Regular expression

  • Regular expressions are a tool to simplify complex string problems

Match symbol

1. Introduction to re module

re module is a system module used by python to support regular related operations

Fullmatch (regular expression, string) - judge whether the regular expression exactly matches the specified string. If not, the result is None

from re import fullmatch

2. Matching class symbols

Regular expressions are composed of various regular symbols

2.1 ordinary characters - characters that do not have special functions and special meanings in regular characters

Ordinary characters represent the symbol itself in regular expressions, such as AZ, AZ, 0 ~ 9, Chinese
result=fullmatch(r'abc','abc')
print(result)

2.2 special symbols

1) . - Match any character note: a point can only match one character

result=fullmatch(r'a.b','a5b')
print(result)

2) \ d - match any numeric character

result=fullmatch(r'a\db','a3b')
print(result)

3) \ s - match any white space character
White space characters: space, \ t, \ n

result=fullmatch(r'abc\s123','abc 123')
print(result)

4)\D \S -
\D - matches any non numeric character
\S - matches any non white space character

result=fullmatch(r'abc\D12\S3','abc 12a3')
print(result)

5) [character set] - matches any character in the character set

Case:
Case 1: all ordinary characters: [xyz12] - match any one of them
Case 2: the matching symbol beginning with \ is included. At this time, the function of the matching symbol is effective; [mn\d] == [mn0123456789]
Case 3: the minus sign is between two symbols, indicating who to whom (judged according to the coding value)
[a-z] - match any lowercase letter
[a-zA-Z] - match any letter
[\ u4e00-\u9fa5] - any Chinese

result=fullmatch(r'a[xyz]c123','axc123')
print(result)

result=fullmatch(r'a[mn\d]b','a1b')
print(result)

result=fullmatch(r'1[a-z]2','1f2')
print(result)

[^character set] - Matches any character that is not in the character set
result=fullmatch(r'1[^a-z]2','122')
print(result)

2, Matching times

Usage: number of matching symbols

# 1. * - zero or more times (any number)
'''
a*b  - b Any number of in front a
\b*b  - b Any number preceded


'''
result = fullmatch(r'a*b', 'aaab')
print(result)

result = fullmatch(r'\d*b', '48798b')
print(result)

result = fullmatch(r'[xyz]*b', 'xyxzzb')
print(result)

# 2. + - one or more times (at least one time)
result = fullmatch(r'a+b', 'aaab')
print(result)

result = fullmatch(r'\d+b', '48798b')
print(result)

result = fullmatch(r'[xyz]+b', 'xyxzzb')
print(result)

# 3. ?  -  0 or 1 times
result=fullmatch('r\d?abc','8abc')
print(result)

# Writing a regular expression can match any positive integer (regardless of 0)
result=fullmatch(r'[+]?[1-9]\d*','025')
print(result)
# 4.{}
'''
{M,N}  - M reach N second
{M,}   - at least M second
{,N}   - most N second
{N}    - N second

'''
result=fullmatch(r'a{3}b','aaab')
print(result)

3, Greed and non greed

# 1. Greed and non greed
'''
 When the matching times are uncertain, the matching mode is divided into greedy and non greedy (the default is non greedy mode)
* + {M,N}  {M,} {,N} ?  -greedy
*? +? {M,N}?  {M,}? {,N}? ??  - non-greedy 
be careful: python Medium processing fullmatch There may be greedy and non greedy problems outside

'''
result=fullmatch('.+b','ab See the attachment abjfajfbjifajb')
print(result)


result=search('.+?b','ab See the attachment abjfajfbjifaj')
print(result)

print('===================================')
res1=open('./top250.html','r',encoding='utf-8').read()
result=findall(r'<span class="inq">(.+)</span>',res1)
print(result)

4, Grouping and branching

from re import  *
# 1. Grouping - ()
# Application scenario 1: enclose the parts of the regular expression with () and operate as a whole
result=fullmatch(r'([a-z]{3}\d{2}){3}','ffe21hfj67jki89')
print(result)

# Application scenario 2: repeat - you can use '\' in a regular expression with groups to repeat the content matched by the nth group in front of it
# '3a3'  '9a9'
result=fullmatch(r'(\d)a\1{3}','3a333')
print(result)

# Application scenario 3: Capture - when using findall, if there is a group in the regular expression, only the matching content in the group will be returned when returning data
str1='faa=4324=432f hair Joeafah43141'
result=findall(r'[a-z](\d+)',str1)
print(result)

result=findall(r'([a-z]{2})=(\d{2})',str1)
print(result)   #[('aa', '43')]

print('==========================')
# 2. Branch -|
# Regular 1 | regular 2 - regular 1 and regular 2 match successfully as long as one of them can match successfully
# abc123,abcJKH
# result=fullmatch(r'abc\d{3]|abc[A-z]{3}','abc123')
result=fullmatch(r'abc(\d{3}|[A-Z]{3})','abc123')
print(result)

5, Others

1. Transfer symbol - add \ 'before the symbol with special function or special meaning to make its function or meaning disappear, and program an ordinary symbol
#'a.b'
result=fullmatch(r'a\.b','a.b')
print(result)

result=fullmatch(f'\+?[1-9]\d','+23')
print(result)

result=fullmatch(r'\(\d{3}\)','(345)')
print(result)

Another way to make the symbol function disappear: when a single symbol has a special function, it can be added to the square[]Make its function disappear
result=fullmatch(r'[+]?[1-9]\d*','23')
print(result)

result=fullmatch(r'[ab^-]123','^123')
print(result)
2. Ignore case: in front of regular (? i)
result=fullmatch(r'(?i)abc','ABC')
print(result)
3. Single line matching and multi line matching:
# When multiple lines match Cannot match '\ n' (default) - (? m)

result=fullmatch(r'a.b','a\nb')
print(result)

# Single line matching, when matching Can match '\ n' - add (? s) before the regular

result=fullmatch(r'(?s)a.b','a\nb')
print(result)

result=fullmatch(r'(?si)a.b','a\nB')
print(result)

# result.group can remove < re Match object;  span=(0, 3)


4. Common functions in re module

(commonly used) fullmatch (regular, string) - judge whether the whole string conforms to the rules described by the regular description (exact match). The matching object is returned if the matching is successful, and None is returned if the matching is failed

Match (regular, string) - matches the beginning of the string, returns the matching object if the matching succeeds, and returns None if the matching fails

Search (regular, string) - get the first regular substring in the string, find the matching object corresponding to the returned substring, and return None if not found

(commonly used) find all (regular, string) - get all the regular substrings in the string and return a list. The elements in the list are strings or tuples

Finder (regular, string) - get all substrings in the string that meet the regularity and return an iterator. The elements in the iterator are the matching objects corresponding to the substring

(commonly used) split (regular, string) - cut the string by taking all the regular substrings in the string as the cutting point, and return a list. The elements in the list are strings

(common) sub (regular, string 1, string 2) - replace all regular substrings in string 2 with string 1, and return the replaced new string

Topics: Front-end Back-end regex