Day 19: regular expressions

Posted by webwalker00 on Thu, 23 Dec 2021 22:51:42 +0100

Inheritance: let the subclass directly own all the properties and methods of the parent class

class Class name(Parent 1, parent 2,...):
    pass

1. Regular expression

Regular expressions are a tool to simplify complex string problems

1) Regular expressions are composed of various regular symbols

2) Introduction to re module

re module is a system module used by python to support regular related operations

Fullmatch (regular expression, string) - judge whether the regular expression exactly matches the specified string. If not, the result is None

#Judge whether the input content is legal (mobile phone number)
from re import fullmatch
def is_tel(tel_num:str):
    return fullmatch(r'1[3-9]\d{9}',tel_num)!=None

1.1 matching symbols

1. Ordinary characters - characters without special functions and special meanings in regular characters

Ordinary characters represent the symbol itself in regular expressions, such as a to Z, a to Z, 0 ~ 9, Chinese

2. Special symbols

1). —— Match any character (one point can only match one character)

result=fullmatch(r'a.d','assd')
result1=fullmatch(r'a.d','asd')
print(result)    # None
print(result1)   # <re.Match object; span=(0, 3), match='asd'>

2) \ d -- match any numeric character

result=fullmatch(r'a\dd','a8d')
print(result)  # <re.Match object; span=(0, 3), match='a8d'>

3) \ s -- match any white space character

White space characters: space, \ t, \ n

result=fullmatch(r'a\sd','a d')
print(result)  # <re.Match object; span=(0, 3), match='a d'>

4) The functions of \ D and \ s are opposite to those of \ D and \ s

\D -- match any non numeric character

\S -- match any non white space character

5) [character set] - matches any character in the character set

result=fullmatch(r'a[xyz]c','abc')
print(result)  # None

1)All ordinary characters
# Match any one of xyz: axc,ayc,azc
result1=fullmatch(r'a[xyz]c','ayc')
print(result1)  # <re.Match object; span=(0, 3), match='ayc'>

2)contain\The matching symbol at the beginning. At this time, the function of the matching symbol is effective;[mn\d]==[mn0123456789]
result=fullmatch(r'a[\dxyz]c','a5c')
print(result)  # <re.Match object; span=(0, 3), match='a5c'>

3)A minus sign between two symbols indicates who to whom(from small to large)
[a-z]-Match any lowercase letter
[A-Z]-Match any uppercase letter
[a-zA-Z]-Match any letter
[\u4e00-\u9fa5]-Match any Chinese
result=fullmatch(r'a[a-z]c','azc')
print(result)    # <re.Match object; span=(0, 3), match='azc'>

6.[^character set]-Matches any character that is not in the character set
result=fullmatch(r'a[^xyz]c','amc')
print(result)   # None
result=fullmatch(r'a[^xyz]c','axc')
print(result)    # <re.Match object; span=(0, 3), match='axc'>

1.2 matching times

Usage: number of matching symbols

a*b-b Any number of a
\d*b-b Preceded by any number

1) * - 0 or more times (any number)

result = fullmatch(r'\d*c', 'c')
result1 = fullmatch(r'\d*c', '566c')
print(result)  # <re.Match object; span=(0, 1), match='c'>
print(result1) # <re.Match object; span=(0, 4), match='566c'>

2) + - one or more times (at least one time)

result = fullmatch(r'\d*c', 'c')
result1 = fullmatch(r'\d*c', '566c')
print(result)  # None
print(result1) # <re.Match object; span=(0, 4), match='566c'>

3)？—— 0 or 1 times

# Exercise: write a regular expression that can match any positive integer (regardless of 0)
result1 = fullmatch(r'[+]?[1-9]+[0-9]*', '1')
result = fullmatch(r'[+]?[1-9]\d*', '1')

4){}

1){M,N}-M reach N Times( M<N)
2){M,}-at least M second
3){,N}-most N second
4){N}-N second

result = fullmatch(r'a{3}', 'aaa')
result1 = fullmatch(r'a{3}', 'aa')
print(result)   # <re.Match object; span=(0, 3), match='aaa'>
print(result1)  # None

1.3 greedy and non greedy

Note: except fullmatch, greedy and non greedy problems may occur in python

When the matching times are uncertain, the matching mode is divided into greedy and non greedy (greedy mode by default)

*,+,{M,N},{M,},{,N}-greedy
*?,+?,{M,N}?,{M,}?,{,N}?-non-greedy (Add English?)

result = search('.+?b', 'try bshbsbj823')
result1= search('.+b', 'try bshbsbj823') 
print(result)  #<re. Match object;  Span = (0, 3), match = 'try B'
print(result1)  #<re. Match object;  Span = (0, 8), match = 'try bshbsb' >

1.4 grouping and branching

1) Grouping - ()

1) Application 1: Use part of regular expression()Enclose and operate as a whole
result=fullmatch(r'([a-z]{3}\d{2}){2}','abc23dfg54')

2) Application 2: Repetition -- can be passed in a grouped regular expression'\N'To repeat the first step before it N Content matched by groups
result=fullmatch(r'(\d)a\1','9a9')  # <re.Match object; span=(0, 3), match='9a9'>
result=fullmatch(r'(\d)a\1','9a6')  # None
result1=fullmatch(r'(\d)(a)\2\1','1aa1') # <re.Match object; span=(0, 4), match='1aa1'>

3) Application 3: use findall If there is a group in the regular expression, only the matching content in the group will be returned when returning data
# Take the number after a lowercase letter
str1='sj12ms55MMK15 Standby time 15'
result=findall(r'[a-z](\d+)',str1)
print(result)   # ['12', '55']

2) Branch -|

1) Regular 1|Regular 2 -- regular 1 and regular 2 match successfully as long as one of them can match successfully
result=fullmatch(r'abc(\d{3}|[A-Z]{3})','abc123')
result=fullmatch(r'abc(\d{3}|[A-Z]{3})','abcKSN')

1.5 others

1) Escape symbol - add \ 'before the symbol with special function or special meaning to make its function or meaning disappear and become an ordinary symbol

# Take a.b
result=fullmatch(r'a\.b','anb') #None
result=fullmatch(r'a\.b','a.b') # <re.Match object; span=(0, 3), match='a.b'>

Another way to make the symbol function disappear: when a single symbol has a special function, it can be placed in[]Make its function disappear
result=fullmatch(r'a[.]b','a.b')

2) Ignore case: precede the regular with (? i)

result=fullmatch(r'(?i)a.b','ACB') #<re.Match object; span=(0, 3), match='ACB'>

3) Single line matching and multi line matching

When multiple rows match Cannot match '\ n' (default) - (? m)

When a single line matches Can match '\ n' - add (? s) to the front of the regular

result=fullmatch(r'a.b','a\nb')  # None
result=fullmatch(r'(?s)a.b','a\nb')  # <re.Match object; span=(0, 3), match='a\nb'>

#Ignore case matching And \ n
result=fullmatch(r'(?si)a.b','A\nb') # <re.Match object; span=(0, 3), match='A\nb'>

1.6 common functions in re module

1) (Commonly used)fullmatch(regular,character string)-Judge whether the whole string conforms to the rules of regular description (exact match). The matching object is returned if the matching is successful, and the matching object is returned if the matching is failed None

2) math(regular,character string)-At the beginning of the matching string, the matching object is returned if the matching is successful, and the matching object is returned if the matching is failed None

3) search(regular,character string)-Get the first regular string in the string, find the matching object corresponding to the return string, and cannot find the return string None

4) (Commonly used)findall(regular,character string)-Get all the regular substrings in the string and return a list. The elements in the list are strings or tuples

5) finditer(regular,character string)-Get all the regular substrings in the string and return an iterator. The elements in the iterator are the matching objects corresponding to the substring

6) (Commonly used)split(regular,character string)-Take all the regular substrings in the string as the cutting point, cut the string and return a list. The elements in the list are strings

7) (Commonly used)sub(regular,String 1,String 2)-Replace all regular substrings in string 2 with string 1, and return the replaced new string

Topics: Python regex

Programmer Think