Built-in module for getting started with Python--re Module

Posted by JeffK on Wed, 18 Sep 2019 05:06:45 +0200

Built-in module for getting started with Python--re Module

1. re module

(1) What is a rule?

Regular is a way of describing a character or string by combining symbols with special meaning, called regular expressions.Or: Regular is the rule used to describe a class of things.(in Python) It is embedded in Python and implemented through the remodule.The regular expression pattern is compiled into a series of byte codes and then executed by a matching engine written in C.

Metacharacter Match Content
\w Match letters (including Chinese) or numbers or underscores
\W Match non-letters (including Chinese) or numbers or underscores
\s Match any whitespace
\S Match any non-whitespace character
\d Match Number
\D Match non-numeric
\A Match from beginning of string
\z Matches the end of the string, if it is a newline, only matches the result before the newline
\n Match a line break
\t Match a tab
^ Beginning of matching string
$ Match end of string
. Matches any character except line breaks. Any character including line breaks can be matched when the re.DOTALL tag is specified.
[...] Match characters in character groups
... Matches all characters except those in character groups
* Match 0 or more left characters.
+ Match one or more left characters.
Match 0 or 1 left character, not greedy.
{n} Precisely match n previous expressions.
{n,m} Matches fragments n to m times defined by previous regular expressions, greedy
ab Match a or b
() Matches expressions within parentheses, also representing a group

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------

import  re 

<1> \w Letter, Number, Underline, Chinese

print(re.findall("\w","Xiao Ming-Marry_dsb123 Xiaotian Eat D breakfast"))   # \w Letters. Numbers. Underlines. Chinese 

<2> \W is not a letter, number, underscore, Chinese

print(re.findall("\W","Xiao Ming-Marry_dsb123 Xiaotian Eat D breakfast"))   # \w is not a letter.Number.Underline.Chinese 

<3> \d Matching Number

print(re.findall("\d","1010⑩"))                       #  \d Matches numbers   

<4> \D Matches Nonnumeric

print(re.findall("\D","1010⑩"))                       # \D Matches Nonnumeric

<5> \A Matches from the beginning of a string What starts commonly used ^a

print(re.findall("\Aa","asfdasdfasdfalex"))
print(re.findall("^a","alex"))                        # What to Begin With  

<6> \Z is commonly used to match from the end of a string to what end it is$

print(re.findall("d\Z","asfdasdfasdfalex"))
print(re.findall("x$","alex"))                        # Match what end  

<7> \n Matches line breaks

print(re.findall("\n","alex\nwusir"))

<8> \t Match Tab

print(re.findall("\t","alex\twusir"))

<9>String matches corresponding string

print(re.findall("alex","alex\twusiralex"))

<10> [...] Matches the characters in the character group

print(re.findall('[0-9]',"Xiao Ming-Marry_dsb123 Xiaotian Eat D breakfast"))                 
print(re.findall('[a-z]',"Xiao Ming-Marry_dsb123 Xiaotian Eat D breakfast"))
print(re.findall('[A-Z]',"Xiao Ming-Marry_dsb123 Xiaotian Eat D breakfast"))

<11> ^[] Matches characters in non-character groups

print(re.findall("[^0-9a-z]","123alex456"))  

<12> *Match 0 or more left characters greedily

print(re.findall("a*","marry,aa,aaaa,bbbbaaa,aaabbbaaa"))    # Match*Left string 0 or more greedy matches

<13> +Match greedy matching of one or more characters on the left

Print (re.findall ("a+", "alex, a a, a a a a, bbbbaaa, aaabbbaaa")) matches the left string one or more greedy matches 

<14>? Match 0 or 1 left character for non-greedy matching

print(re.findall("a?","alex,aa,aaaa,bbbbaaa,aaabbbaaa"))  # Match? 0 or 1 non-greedy match on the left side of the number

<15> {n} Precisely matches n preceding expressions on the left

print(re.findall("[0-9]{11}","18612239999,18612239998,136133333323")) # Specify the number of elements to find 

<16> {n, m} matches fragments defined by regular expressions before n to m times

print(re.findall("a{3,8}","alex,aaaabbbaaaaabbbbbbaaa,aaaaaaaaabb,ccccddddaaaaaaaa")) 

<17> a|b Matches a or B

print(re.findall("a|b","alexdsb"))

<18> () matches the expression in parentheses and also represents a group

print(re.findall("<a>(.+)</a>","<a>alex</a> <a>wusir</a>"))     #Grouping  
print(re.findall("<a>(.+?)</a>","<a>alex</a> <a>wusir</a>"))   #Control Greedy Matching 

<19>. Matches any character except line breaks When re.DOTALL is defined, line breaks can be matched

print(re.findall("a.c","abc,aec,a\nc,a,c"))           # Match any string except \n 
print(re.findall("a.c","abc,aec,a\nc,a,c",re.DOTALL))

<20>. No more functions

print(re.findall("-\d+\.\d+|-[0-9]|\d+",s))

<21> \s Match Spaces

print(re.findall("\s","alex\tdsbrimocjb"))            # \s Match Spaces

<22> \S Matches Non-Spaces

print(re.findall("\S","alex\tdsbrimocjb"))            # \s Matches Non-Spaces

Test questions:

Has the following string:'alex_sb ale123_sb wu12sir_sb wusir_sb ritian_sb'alex wusir'found all with_sb

Answer:

s = 'alex_sb ale123_sb wu12sir_sb wusir_sb ritian_sb'print(re.findall("(.+?)_sb",s))

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

(1) findall found all returned a list

print(re.findall("alex","alexdsb,alex_sb,alexnb,al_ex"))

(2) search stops when it finds a match anywhere in the string and returns an object.

print(re.search("a.+","lexaaaa,bssssaaaasa,saaasaasa").group())

(3) match matches from the beginning of the string

print(re.match("a.+","alexalexaaa,bssssaaaasa,saaasaasa").group())

test questions

The difference between search and match

search starts anywhere

Match looks from scratch, and if it doesn't match, it doesn't

All viewed with group()

(4) split -- split must have []

print(re.split("[:;,.!#]","alex:dsb#wusir.djb"))

(5) sub -- replacement

s = "alex:dsb#wusir.djb"
print(re.sub("d","e",s,count=1))

(6) complie -- Define matching rules

s = re.compile("\w")
print(s.findall("alex:dsb#wusir.djb"))

(7) finditer -- returns an iterator

s = re.finditer("\w","alex:dsb#wusir.djb")   # What is returned is an iterator
print(next(s).group())
print(next(s).group())
for i in s:
    print(i.group())

(8) search -- Name the group?

ret = re.search("<(?P<tag_name>\w+)>\w+</\w+>","<h1>hello</h1>")
ret = re.search("<(?P<tag_name>\w+)>(?P<content>\w+)</\w+>","<h1>hello</h1>")
print(ret.group("tag_name"))
print(ret.group("content"))

Topics: Python