Built-in module for getting started with Python--re Module
1. re module
(1) What is a rule?
Regular is a way of describing a character or string by combining symbols with special meaning, called regular expressions.Or: Regular is the rule used to describe a class of things.(in Python) It is embedded in Python and implemented through the remodule.The regular expression pattern is compiled into a series of byte codes and then executed by a matching engine written in C.
Metacharacter | Match Content |
---|---|
\w | Match letters (including Chinese) or numbers or underscores |
\W | Match non-letters (including Chinese) or numbers or underscores |
\s | Match any whitespace |
\S | Match any non-whitespace character |
\d | Match Number |
\D | Match non-numeric |
\A | Match from beginning of string |
\z | Matches the end of the string, if it is a newline, only matches the result before the newline |
\n | Match a line break |
\t | Match a tab |
^ | Beginning of matching string |
$ | Match end of string |
. | Matches any character except line breaks. Any character including line breaks can be matched when the re.DOTALL tag is specified. |
[...] | Match characters in character groups |
... | Matches all characters except those in character groups |
* | Match 0 or more left characters. |
+ | Match one or more left characters. |
? | Match 0 or 1 left character, not greedy. |
{n} | Precisely match n previous expressions. |
{n,m} | Matches fragments n to m times defined by previous regular expressions, greedy |
ab | Match a or b |
() | Matches expressions within parentheses, also representing a group |
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
import re
<1> \w Letter, Number, Underline, Chinese
print(re.findall("\w","Xiao Ming-Marry_dsb123 Xiaotian Eat D breakfast")) # \w Letters. Numbers. Underlines. Chinese
<2> \W is not a letter, number, underscore, Chinese
print(re.findall("\W","Xiao Ming-Marry_dsb123 Xiaotian Eat D breakfast")) # \w is not a letter.Number.Underline.Chinese
<3> \d Matching Number
print(re.findall("\d","1010⑩")) # \d Matches numbers
<4> \D Matches Nonnumeric
print(re.findall("\D","1010⑩")) # \D Matches Nonnumeric
<5> \A Matches from the beginning of a string What starts commonly used ^a
print(re.findall("\Aa","asfdasdfasdfalex"))
print(re.findall("^a","alex")) # What to Begin With
<6> \Z is commonly used to match from the end of a string to what end it is$
print(re.findall("d\Z","asfdasdfasdfalex"))
print(re.findall("x$","alex")) # Match what end
<7> \n Matches line breaks
print(re.findall("\n","alex\nwusir"))
<8> \t Match Tab
print(re.findall("\t","alex\twusir"))
<9>String matches corresponding string
print(re.findall("alex","alex\twusiralex"))
<10> [...] Matches the characters in the character group
print(re.findall('[0-9]',"Xiao Ming-Marry_dsb123 Xiaotian Eat D breakfast")) print(re.findall('[a-z]',"Xiao Ming-Marry_dsb123 Xiaotian Eat D breakfast")) print(re.findall('[A-Z]',"Xiao Ming-Marry_dsb123 Xiaotian Eat D breakfast"))
<11> ^[] Matches characters in non-character groups
print(re.findall("[^0-9a-z]","123alex456"))
<12> *Match 0 or more left characters greedily
print(re.findall("a*","marry,aa,aaaa,bbbbaaa,aaabbbaaa")) # Match*Left string 0 or more greedy matches
<13> +Match greedy matching of one or more characters on the left
Print (re.findall ("a+", "alex, a a, a a a a, bbbbaaa, aaabbbaaa")) matches the left string one or more greedy matches
<14>? Match 0 or 1 left character for non-greedy matching
print(re.findall("a?","alex,aa,aaaa,bbbbaaa,aaabbbaaa")) # Match? 0 or 1 non-greedy match on the left side of the number
<15> {n} Precisely matches n preceding expressions on the left
print(re.findall("[0-9]{11}","18612239999,18612239998,136133333323")) # Specify the number of elements to find
<16> {n, m} matches fragments defined by regular expressions before n to m times
print(re.findall("a{3,8}","alex,aaaabbbaaaaabbbbbbaaa,aaaaaaaaabb,ccccddddaaaaaaaa"))
<17> a|b Matches a or B
print(re.findall("a|b","alexdsb"))
<18> () matches the expression in parentheses and also represents a group
print(re.findall("<a>(.+)</a>","<a>alex</a> <a>wusir</a>")) #Grouping print(re.findall("<a>(.+?)</a>","<a>alex</a> <a>wusir</a>")) #Control Greedy Matching
<19>. Matches any character except line breaks When re.DOTALL is defined, line breaks can be matched
print(re.findall("a.c","abc,aec,a\nc,a,c")) # Match any string except \n
print(re.findall("a.c","abc,aec,a\nc,a,c",re.DOTALL))
<20>. No more functions
print(re.findall("-\d+\.\d+|-[0-9]|\d+",s))
<21> \s Match Spaces
print(re.findall("\s","alex\tdsbrimocjb")) # \s Match Spaces
<22> \S Matches Non-Spaces
print(re.findall("\S","alex\tdsbrimocjb")) # \s Matches Non-Spaces
Test questions:
Has the following string:'alex_sb ale123_sb wu12sir_sb wusir_sb ritian_sb'alex wusir'found all with_sb
Answer:
s = 'alex_sb ale123_sb wu12sir_sb wusir_sb ritian_sb'print(re.findall("(.+?)_sb",s))
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
(1) findall found all returned a list
print(re.findall("alex","alexdsb,alex_sb,alexnb,al_ex"))
(2) search stops when it finds a match anywhere in the string and returns an object.
print(re.search("a.+","lexaaaa,bssssaaaasa,saaasaasa").group())
(3) match matches from the beginning of the string
print(re.match("a.+","alexalexaaa,bssssaaaasa,saaasaasa").group())
test questions
The difference between search and match
search starts anywhere
Match looks from scratch, and if it doesn't match, it doesn't
All viewed with group()
(4) split -- split must have []
print(re.split("[:;,.!#]","alex:dsb#wusir.djb"))
(5) sub -- replacement
s = "alex:dsb#wusir.djb" print(re.sub("d","e",s,count=1))
(6) complie -- Define matching rules
s = re.compile("\w") print(s.findall("alex:dsb#wusir.djb"))
(7) finditer -- returns an iterator
s = re.finditer("\w","alex:dsb#wusir.djb") # What is returned is an iterator print(next(s).group()) print(next(s).group()) for i in s: print(i.group())
(8) search -- Name the group?
ret = re.search("<(?P<tag_name>\w+)>\w+</\w+>","<h1>hello</h1>") ret = re.search("<(?P<tag_name>\w+)>(?P<content>\w+)</\w+>","<h1>hello</h1>") print(ret.group("tag_name")) print(ret.group("content"))