re module
1,re.match function
re.match attempts to match from the starting position of a string. If it cannot match from the starting position, it returns None.
Syntax: re match(pattern, string, flags=0)
Parameters:
pattern -- regular expression
String -- matching string
flags -- flag bit, which is used to control the matching method of regular expressions, such as case sensitivity, multi line matching, etc. it is an optional flag. Common values are as follows:
Multiple flags can be specified by bitwise OR(|) them. Such as re I | re. M is set to I and M flags:
Modifier | describe |
---|---|
re.I | Make matching pairs case insensitive |
re.L | Do local aware matching |
re.M | Multi line matching, affecting ^ and$ |
re.S | Make Matches all characters, including line breaks |
re.U | Parses characters according to the Unicode character set. This flag affects \ w, \W, \b, \B |
re.X | This flag gives you a more flexible format so that you can write regular expressions easier to understand. |
Return value: if the matching is successful, the matching object is returned; otherwise, None is returned.
You can use group(num) or groups() to get matching objects:
Matching object method | describe |
---|---|
group(num=0) | group() can enter multiple group numbers at a time, in which case it will return a tuple containing the values corresponding to those groups. |
groups() | Returns a tuple containing all group strings, from 1 to the contained group number. |
example:
import re strlist = re.match('www', 'www.com.cn', flags=0) print(strlist) print(type(strlist)) print(strlist.span()) print(strlist.group(0)) print(strlist.groups())
Output:
<re.Match object; span=(0, 3), match='www'>
<class 're.Match'>
(0, 3)
www
()
2,re.search function
Scan the string and return the first string that matches successfully.
Syntax: re search(pattern, string, flags=0)
Parameters:
pattern -- regular expression
String -- matching string
flags -- flag bit, and re Match is the same
Return value: if the matching is successful, the matching object is returned; otherwise, None is returned.
example:
import re #Match email address line="abc test@baidu.com.cn mm" pattern = r'\b[\w.%+-]+@[\w.-]+\.[a-zA-Z]{2,6}\b' matchobj = re.search(pattern, line, flags=0) print(type(matchobj)) print(matchobj.group(0))
Output:
<class 're.Match'>
test@baidu.com.cn
3,re.match and re The difference between search:
re.match matches from the beginning of the string. If it cannot match at the beginning, it returns None;
re.search matches the entire string until a matching string is found.
4. Retrieval and replacement
re.sub is used to replace matches in the string
Syntax: re sub(pattern, repl, string, count=0, flags=0)
Parameters:
pattern -- regular expression
repl -- replace the string, which can also be used as a function
String -- matching string
count -- the maximum number of substitutions after pattern matching, and 0 represents all substitutions matched. Optional
flags -- flag bit, optional
example:
import re #replace phone = "This is a telephone number: 1234-556-778 #The place of ownership is xx“ #Match phone number patt1 = r'[0-9]{4}-[0-9]{3}-[0-9]{3}' phone_no = re.search(patt1, phone, flags=0) print("The telephone number is:",phone_no.group(0)) #Replace- patt2 = r'\D' phone_num = re.sub(patt2, "", phone_no.group(0), count=0, flags=0) print(type(phone_num))
Output:
Telephone number: 1234-556-778
<class 'str'>
Tel: 1234556778
If repl is a function, instance:
import re #Replace with function # Multiply the matching number by 2 def double(matched): value = int(matched.group('value')) return str(value * 2) s = 'A23G4HFD567' #? P < value > means to name a group named value, and the matching rule complies with the following / d+ print(re.sub('(?P<value>\d+)', double, s))
Output:
A46G8HFD1134
5,re.compile
The compile function is used to compile regular expressions and generate a regular expression (Pattern) object for use by the match() and search() functions.
Syntax: re compile(pattern[,flags])
Parameters:
pattern -- regular expression
flags -- optional, indicating the matching mode, such as ignoring case, multiline mode, etc. The details are as follows:
-
- re.I ignore case
- re.L indicates that the special character set \ w, \W, \b, \B, \s, \S depends on the current environment
- re.M multiline mode
- re. 's. ' And any character including newline character ('.' does not include newline character)
- re.U indicates that the special character set \ w, \W, \b, \B, \d, \D, \s, \S depends on the Unicode character attribute database
- re.X to increase readability, ignore spaces and comments after '#'
Example:
>>>import re >>> pattern = re.compile(r'\d+') # Used to match at least one number >>> m = pattern.match('one12twothree34four') # Find header, no match >>> print( m ) None >>> m = pattern.match('one12twothree34four', 2, 10) # Match from the position of 'e', no match >>> print( m ) None >>> m = pattern.match('one12twothree34four', 3, 10) # Match from the position of '1', just match >>> print( m ) # Match object returns a <_sre.SRE_Match object at 0x10a42aac0> >>> m.group(0) # 0 can be omitted '12' >>> m.start(0) # 0 can be omitted 3 >>> m.end(0) # 0 can be omitted 5 >>> m.span(0) # 0 can be omitted (3, 5)
Above, when the Match is successful, a Match object is returned, where:
- The group([group1,...]) method is used to obtain one or more group matching strings. When you want to obtain the whole matching substring, you can directly use {group() or} group(0);
- The start([group]) method is used to obtain the starting position (the index of the first character of the substring) of the substring matched by the group in the whole string. The default value of the parameter is 0;
- The end([group]) method is used to obtain the end position of the substring matched by the group in the whole string (the index of the last character of the substring + 1), and the default value of the parameter is 0;
- span([group]) method returns (start(group), end(group)
Another example shows that there is no error reporting:
>>>import re >>> pattern = re.compile(r'([a-z]+) ([a-z]+)', re.I) # re.I means ignore case >>> m = pattern.match('Hello World Wide Web') >>> print( m ) # If the Match is successful, a Match object is returned <_sre.SRE_Match object at 0x10bea83e8> >>> m.group(0) # Returns the entire substring that matches successfully 'Hello World' >>> m.span(0) # Returns the index of the entire substring that matches successfully (0, 11) >>> m.group(1) # Returns the substring of the first packet matching success 'Hello' >>> m.span(1) # Returns the index of the substring of the first packet matching success (0, 5) >>> m.group(2) # Returns the substring of the second packet matching success 'World' >>> m.span(2) # Returns the substring index of the second packet matching success (6, 11) >>> m.groups() # Equivalent to (m.group(1), m.group(2),...) ('Hello', 'World') >>> m.group(3) # There is no third group Traceback (most recent call last): File "<stdin>", line 1, in <module> IndexError: no such group
6,findall
Find all substrings matched by the regular expression in the string and return a list. If there are multiple matching patterns, return the tuple list. If no matching pattern is found, return the empty list.
Syntax: re Findall (pattern, string, flags = 0) or pattern,findall(string[, pos[, endpos]])
Parameters:
pattern -- regular expression
String -- matching string
flags -- flag bit
pos -- optional parameter that specifies the starting position of the string. The default value is 0.
endpos -- optional parameter, used to specify the end position of the string. The default is the length of the string.
example:
import re result1 = re.findall(r'\d+','runoob 123 google 456') pattern = re.compile(r'\d+') # Find number result2 = pattern.findall('runoob 123 google 456') result3 = pattern.findall('run88oob123google456', 0, 10) print(result1) print(result2) print(result3)
Output:
['123', '456']
['123', '456']
['88', '12']
Multiple matching patterns, return tuple list:
import re result = re.findall(r'(\w+)=(\d+)', 'set width=20 and height=10') print(result)
Output:
[('width', '20'), ('height', '10')]
7,re.finditer
Similar to findall, all substrings matched by the regular expression are found in the string and returned as an iterator.
Format: re finditer(pattern, string, flags=0)
Parameters:
pattern -- regular expression
String -- string to match
flags -- matching mode, optional
example:
import re #finditer it = re.finditer(r"\d+","12a32bc43jf3") for match in it: print (match.group() )
Output:
12
32
43
3
8,re.split
The split method splits the string according to the substring that can be matched and returns the list.
Syntax: re split(pattern, string[, maxsplit=0, flags=0])
Parameters:
pattern -- regular expression
String -- string to match
Maxplit -- split times, maxplit = 1 split once, default to 0, unlimited times
flags -- matching mode, optional
example:
import re #split print(re.split('\W+', 'runoob, runoob, runoob.')) print( re.split('(\W+)', ' runoob, runoob, runoob.')) print(re.split('\W+', ' runoob, runoob, runoob.', 1)) print(re.split('a*', 'hello world'))
Output:
['runoob', 'runoob', 'runoob', '']
['', ' ', 'runoob', ', ', 'runoob', ', ', 'runoob', '.', '']
['', 'runoob, runoob, runoob.']
['', 'h', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd', '']