Built-in module for getting started with Python--re Module
1. re module
(1) What is a rule?
Regular is a way of describing a character or string by combining symbols with special meaning, called regular expressions.Or: Regular is the rule used to describe a class of things.(in Python) It is embedded in Python and implemented through the remodule.The regular expression pattern is compiled into a series of byte codes and then executed by a matching engine written in C.
Metacharacter | Match Content |
\w | Match letters (including Chinese) or numbers or underscores |
\W | Match non-letters (including Chinese) or numbers or underscores |
\s | Match any whitespace |
\S | Match any non-whitespace character |
\d | Match Number |
\D | Match non-numeric |
\A | Match from beginning of string |
\z | Matches the end of the string, if it is a newline, only matches the result before the newline |
\n | Match a line break |
\t | Match a tab |
^ | Beginning of matching string |
$ | Match end of string |
. | Matches any character except line breaks. Any character including line breaks can be matched when the re.DOTALL tag is specified. |
[...] | Match characters in character groups |
... | Matches all characters except those in character groups |
* | Match 0 or more left characters. |
+ | Match one or more left characters. |
? | Match 0 or 1 left character, not greedy. |
{n} | Precisely match n previous expressions. |
{n,m} | Matches fragments n to m times defined by previous regular expressions, greedy |
ab | Match a or b |
() | Matches expressions within parentheses, also representing a group |
import re
<1> \w Letter, Number, Underline, Chinese
print(re.findall("\w","Xiao Ming-Marry_dsb123 Xiaotian Eat D breakfast")) # \w Letters. Numbers. Underlines. Chinese
<2> \W is not a letter, number, underscore, Chinese
print(re.findall("\W","Xiao Ming-Marry_dsb123 Xiaotian Eat D breakfast")) # \w is not a letter.Number.Underline.Chinese
<3> \d Matching Number
print(re.findall("\d","1010⑩")) # \d Matches numbers
<4> \D Matches Nonnumeric
print(re.findall("\D","1010⑩")) # \D Matches Nonnumeric
<5> \A Matches from the beginning of a string What starts commonly used ^a
print(re.findall("^a","alex")) # What to Begin With
<6> \Z is commonly used to match from the end of a string to what end it is$
print(re.findall("x$","alex")) # Match what end
<7> \n Matches line breaks
<8> \t Match Tab
<9>String matches corresponding string
<10> [...] Matches the characters in the character group
print(re.findall('[0-9]',"Xiao Ming-Marry_dsb123 Xiaotian Eat D breakfast")) print(re.findall('[a-z]',"Xiao Ming-Marry_dsb123 Xiaotian Eat D breakfast")) print(re.findall('[A-Z]',"Xiao Ming-Marry_dsb123 Xiaotian Eat D breakfast"))
<11> ^[] Matches characters in non-character groups
<12> *Match 0 or more left characters greedily
print(re.findall("a*","marry,aa,aaaa,bbbbaaa,aaabbbaaa")) # Match*Left string 0 or more greedy matches
<13> +Match greedy matching of one or more characters on the left
Print (re.findall ("a+", "alex, a a, a a a a, bbbbaaa, aaabbbaaa")) matches the left string one or more greedy matches
<14>? Match 0 or 1 left character for non-greedy matching
print(re.findall("a?","alex,aa,aaaa,bbbbaaa,aaabbbaaa")) # Match? 0 or 1 non-greedy match on the left side of the number
<15> {n} Precisely matches n preceding expressions on the left
print(re.findall("[0-9]{11}","18612239999,18612239998,136133333323")) # Specify the number of elements to find
<16> {n, m} matches fragments defined by regular expressions before n to m times
<17> a|b Matches a or B
<18> () matches the expression in parentheses and also represents a group
print(re.findall("<a>(.+)</a>","<a>alex</a> <a>wusir</a>")) #Grouping print(re.findall("<a>(.+?)</a>","<a>alex</a> <a>wusir</a>")) #Control Greedy Matching
<19>. Matches any character except line breaks When re.DOTALL is defined, line breaks can be matched
print(re.findall("a.c","abc,aec,a\nc,a,c")) # Match any string except \n
<20>. No more functions
<21> \s Match Spaces
print(re.findall("\s","alex\tdsbrimocjb")) # \s Match Spaces
<22> \S Matches Non-Spaces
print(re.findall("\S","alex\tdsbrimocjb")) # \s Matches Non-Spaces
Test questions:
Has the following string:'alex_sb ale123_sb wu12sir_sb wusir_sb ritian_sb'alex wusir'found all with_sb
s = 'alex_sb ale123_sb wu12sir_sb wusir_sb ritian_sb'print(re.findall("(.+?)_sb",s))
(1) findall found all returned a list
(2) search stops when it finds a match anywhere in the string and returns an object.
(3) match matches from the beginning of the string
test questions
The difference between search and match
search starts anywhere
Match looks from scratch, and if it doesn't match, it doesn't
All viewed with group()
(4) split -- split must have []
(5) sub -- replacement
s = "alex:dsb#wusir.djb" print(re.sub("d","e",s,count=1))
(6) complie -- Define matching rules
s = re.compile("\w") print(s.findall("alex:dsb#wusir.djb"))
(7) finditer -- returns an iterator
s = re.finditer("\w","alex:dsb#wusir.djb") # What is returned is an iterator print(next(s).group()) print(next(s).group()) for i in s: print(i.group())
(8) search -- Name the group?
ret = re.search("<(?P<tag_name>\w+)>\w+</\w+>","<h1>hello</h1>") ret = re.search("<(?P<tag_name>\w+)>(?P<content>\w+)</\w+>","<h1>hello</h1>") print(ret.group("tag_name")) print(ret.group("content"))