Application of re Library
The re library is a standard library for processing regular expressions in Python. While introducing the re library, this blog will briefly introduce the regular expression syntax. If you want to further study regular expressions, you still need to make great efforts.
regular expression syntax
Regular expression syntax consists of characters and operators. You can master these contents in the initial stage.
Operator | explain | example | ||
---|---|---|---|---|
. | Any single character, rarely can't match | |||
[] | Character set, which gives the value range of a single character | [abc] indicates matching a, b and c, [a-z] indicates a to z single character | ||
[\^] | A non character set that gives an exclusion range for a single character | [\ ^ abc] indicates a single character matching non-A, non-b and non-c | ||
* | The previous character is extended 0 or infinite times | abc * means ab, abc, abcc, abccc, etc | ||
+ | The previous character is extended 1 or infinite times | abc + means abc, abcc, abccc, etc | ||
? | Previous character 0 or 1 times | abc? Indicates ab, ABC | ||
Either left or right expression | abc | Def means abc or def | ||
{m} | Extend the first 1 character m times | ab{2}c, indicating abbc | ||
{m,n} | Expand the first 1 character m to n times | ab{1,2}c means abc and abbc | ||
\^ | Matches the beginning of a string | \^abc means abc is at the beginning of the string | ||
\$ | Match end of string | abc $means abc is at the end of the string | ||
() | Group mark, internal only | Operator | (abc) means abc, (a) | b) Indicates a and B |
\d | Number, equivalent to [0-9] | |||
\w | Character, equivalent to [A-Za-z0-9] |
The above representation is only the most basic part of regular expression. If you want to study regular expression in depth, it is recommended to find more comprehensive materials for learning. This paper only does medicine citation.
Basic usage of re Library
The main functions of re library are as follows:
- Basic function: compile;
- Function functions: search, match, findall, split, finditer, sub.
Before formal learning, learn about native strings.
In Python, to represent a native string, you need to precede the string with r.
For example, my_str = 'i'am xiangpica' will directly report an error in the program. If you want the string 'to run normally, you need to add the transfer character \, and modify it to my_str = 'i\'am xiangpica'.
However, in combination with the operators in the regular expression above, there will be problems, because \ has real meaning in the regular expression. If you use the re library to match \, you need to use four backslashes. In order to avoid this situation, the concept of native string is introduced.
# Regular expression "\ \ \ \" without native strings # Regular expression r "\ \" using native strings
There will be practical application in the following text.
Next, learn a case, such as the following code:
my_str='C:\number' print(my_str)
C: umber
The output effect of this code is as follows, \ n which is parsed into line feed. If you want to shield this phenomenon, use r:
my_str=r'C:\number' print(my_str)
Output C:\number.
re library related function description
re.search function
This function is used to search the string for the value of the first position matched by the regular expression and return the match object.
The function prototype is as follows:
re.search(pattern,string,flags=0)
Requirement: match Charlie in the string "good good".
import re my_str='Charlie is not a dog good good' pattern = r'Charlie' ret = re.search(pattern,my_str) print(ret)
Return result: & lt; re. Match object; Span = (2, 5), match = 'Charlie' & gt;.
The third parameter flags of the search function represents the control flag when the regular expression is used.
- re.I,re.IGNORECASE: ignore the case of regular expressions;
- re.M,re. Multiple: the \ ^ operator in the regular expression can take each line of a given string as the beginning of matching;
- re.S,re.DOTALL: in regular expressions The operator can match all characters.
Finally, the matched string is output, which can be realized by using the following code.
import re my_str = 'Charlie is not a dog good good' pattern = r'Charlie' ret = re.search(pattern, my_str) if ret: print(ret.group(0))
re.match function
This function is used to match the regular expression at the beginning of the target string, return the match object, and return None if the match is not successful. The function prototype is as follows:
re.match(pattern,string,flags=0)
It is important to note that this is the starting position of the target string.
import re my_str = 'Charlie is not a dog good good' pattern = r'A bird' # Match to data pattern = r'good' # Data not matched ret = re.match(pattern, my_str) if ret: print(ret.group(0))
re.match and re The search method returns at most one matching object at a time. If you want to return multiple values, you can construct a matching group in pattern and return multiple strings.
re.findall function
This function is used to search strings and return all matching strings in list format. The function prototype is as follows:
re.findall(pattern,string,flags=0)
The test code is as follows:
import re my_str = 'Charlie is not a dog good good' pattern = r'good' ret = re.findall(pattern, my_str) print(ret)
re.split function
This function divides a string according to the regular expression matching result and returns a list.
The function prototype is as follows:
re.split(pattern, string, maxsplit=0, flags=0)
re. During the split function, if the character matched by the regular expression is just at the beginning or end of the string, there are more spaces at the beginning and end of the returned split string list, which needs to be removed manually. For example, the following code:
import re my_str = '1 Charlie is not a fan good1good1' pattern = r'\d' ret = re.split(pattern, my_str) print(ret)
Operation results:
['', 'Charlie is not a dog', 'good', 'good', '']
Switch to the middle content, and the string can be divided correctly.
import re my_str = '1 Charlie is not a fan good1good1' pattern = r'good' ret = re.split(pattern, my_str) print(ret)
If parentheses are captured in the pattern, the matching results in the parentheses will also be in the returned list.
import re my_str = '1 Charlie is not a fan good1good1' pattern = r'(good)' ret = re.split(pattern, my_str) print(ret)
As a result, you can compare the difference between bracketed and non bracketed to learn:
['1 Charlie 1', 'good', '1', 'good', '1']
The maxplit parameter indicates the maximum number of times to split, and all the remaining characters are returned to the last element of the list. For example, set the matching once, and the result is ['1 dream eraser 1', '1good1'].
re. Finder function
Search the string and return an iterator that matches the result. Each iteration element is a match object. The function prototype is as follows:
re.finditer(pattern,string,flags=0)
The test code is as follows:
import re my_str = '1 Charlie is not a fan good1good1' pattern = r'good' # ret = re.split(pattern, my_str,maxsplit=1) ret =re.finditer(pattern, my_str) print(ret)
re.sub function
Replace the string matched by the regular expression in a string and return the replaced string. The function prototype is as follows:
re.sub(pattern,repl,string,count=0,flags=0)
Where repl parameter is the string that replaces the matching string, and count parameter is the maximum number of matching replacements.
import re my_str = '1 Charlie is not a fan good1good1' pattern = r'good' ret = re.sub(pattern, "nice", my_str) print(ret)
After running, get the replaced string:
1 Charlie is not a fan nice1nice1
Other functions of re Library
Other common functions are: re fullmatch(),re.subn(),re.escape(), for more information Official documents , get first-hand information.
Object oriented writing of re Library
The above is a functional writing method. The re library can adopt the object-oriented writing method. After compiling the regular expression, it can be operated many times. The core function used is re compile.
The prototype of this function is as follows:
regex = re.compile(pattern,flags=0)
Where pattern is a regular expression string or a native string.
The test code is as follows:
import re my_str = '1 Charlie is not a fan good1good1' # Regular object regex = re.compile(pattern = r'good') ret = regex.sub("nice", my_str) print(ret)
The above code compiles the regular expression into a regular object, which is followed by regex There is no need to write regular expressions in the sub function. When using, you only need to replace all re objects with the compiled regex object, and then call the corresponding method.
match object of re Library
After matching a string using the re library, a match object is returned with the following properties and methods.
Properties of the match object
- . string: text to be matched;
- . re: pattern object used in matching;
- . pos: start position of regular expression search text;
- . endpos: the end of the regular expression search text.
The test code is as follows:
import re my_str = '1 Charlie is not a fan good1good1' regex = re.compile(pattern = r'g\w+d') ret = regex.search(my_str) print(ret) print(ret.string) print(ret.re) print(ret.pos) print(ret.endpos)
Result output:
<re.Match object; span=(7, 16), match='good1good'> 1 Charlie is not a fan good1good1 re.compile('g\\w+d') 0 17
Method of match object
- . group(0): get the matching string;
- . start(): the starting position of the matching string in the original string;
- . end(): the matching string is at the end of the original string;
- . span(): return (. Start() end())
Because the content is relatively simple, the specific code will not be displayed.
Summary of this blog
This blog has learned the knowledge of the re Library in Python, focusing on the functions in the re library. It does not explain the regular expressions too much. I hope it will be helpful to you.