If you are Xiao Bai, this data set can help you become a bull. If you have rich development experience, this data set can help you break through the bottleneck.
2022 Web Full Video Tutorial Front End Architecture H5 vue node Applet Video+Materials+Code+Interview Questions.
When you need to match strings through regular expressions in Python, you can make a module that comes with Python named re.
The approximate matching process for regular expressions is:
1. Compare the characters in the expression with those in the text in turn.
2. If each character can be matched, the match will be successful; Matching fails once unsuccessful characters are matched.
3. If there are quantifiers or boundaries in the expression, the process will be slightly different.
R: Backslashes do not require any special handling in string literal values with the prefix'r'. So r''denotes a string that contains two characters'' and'n', while''denotes a string that contains only one line break.
Use of the re module: import re
re.match function
Syntax: re.match(pattern, string, flags=0)
pattern
Matching Regular Expressions
string
String to match
flags
Flag bits, used to control how regular expressions are matched, such as case-sensitive, multi-line matching, and so on.
- re.I Ignore case
- re.L stands for the special character set w, W, B, s, S depending on the current environment
- re.M Multiline Mode
- re.S is. And any characters including line breaks (.excluding line breaks)
- re.U stands for the special character set w, W, B, d, D, s, S depending on the Unicode character attribute database
- re.X For readability, ignore spaces and comments after #
Attempts to match a pattern from the start of a string, or matches () returns none if the match is not successful. Match successful re. The match method returns a matching object.
If the previous step matches the data, you can use the group method to extract the data. To use the group(num) or groups() Match Object function to get a match expression.
The group() string **, **() is used to group, group() and group (0) are the overall results of matching regular expressions, group(1) lists the first parenthesis matching part, group(2) lists the second parenthesis matching part, and group(3) lists the third parenthesis matching part. No matches succeeded, re.search() returns None.
Give an example:
>>> import re >>> result = re.match("itcast","itcast.cn") >>> result.group() 'itcast'
Matching patterns from the string header is perfectly matchable, patternmatching ends, and matching ends, followed by. The cn no longer matches, returning information about the success of the match.
Match a single character
character
function
position
.
Match any 1 character except
[ ]
Match the characters listed in []
d
Match numbers, 0-9
Can be written in character set [...]
D
Match numbers, that is, not numbers
Can be written in character set [...]
s
Match empty space, tab key
Can be written in character set [...]
S
Match empty characters
Can be written in character set [...]
w
Match word characters, that is a-z, A-Z, 0-9, _
Can be written in character set [...]
W
Match Word Characters
Can be written in character set [...]
w
w Matches word characters, a-z, A-Z, 0-9, _
W
Match Word Characters
[...] character set, the corresponding position can be any character in the character set. Characters in a character set can be listed individually or given a range, such as [abc] and [a-c], where the first character is a negation. All special characters, such as']''-'', lose their original meaning in the character set. To use']' -'put in the first character,'^'put in a non-first character.
Give an example:
import re ret = re.match(".","M") print(ret.group()) ret = re.match("t.o","too") print(ret.group()) ret = re.match("t.o","two") print(ret.group()) # What if hello's? Character? Write, then the regular expression needs? Written h ret = re.match("h","hello Python") print(ret.group()) # What if hello's? Character? Write, then the regular expression needs? Written H ret = re.match("H","Hello Python") print(ret.group()) # ?? Writing h works ret = re.match("[hH]","hello Python") print(ret.group()) ret = re.match("[hH]","Hello Python") print(ret.group()) ret = re.match("[hH]ello Python","Hello Python") print(ret.group()) # Multiple Writings Matching 0 to 9 ret = re.match("[0123456789]Hello Python","7Hello Python") print(ret.group()) ret = re.match("[0-9]Hello Python","7Hello Python") print(ret.group()) # Match 0 to 3 and 5-9 ret = re.match("[0-35-9]Hello Python","7Hello Python") print(ret.group()) ret = re.match("[0-35-9]Hello Python","4Hello Python") #print(ret.group()) ret = re.match("Chang E d Number","Chang'e 1 successfully launched") print(ret.group()) ret = re.match("Chang E d Number","Chang'e 2 successfully launched") print(ret.group())
Result:
M
too
two
h
H
h
H
Hello Python
7Hello Python
7Hello Python
7Hello Python
Chang E 1
Chang E 2
Match multiple characters
character
function
position
Expression Instances
Completely matched string
*
Match the previous character 0 times or limit, you can have
Used after a character or (...)
abc*
abccc
Match the previous character once or only once, that is, less than once
Used after a character or (...)
abc+
abccc
Match the previous character once or 0 times, either once or not
Used after a character or (...)
abc
ab,abc
{m}
m occurrences of matching previous character
Used after a character or (...)
ab{2}c
abbc
{m,n}
Matches the previous character from m to n times, matches 0 to N times if M is omitted, matches m to infinite times if n is omitted
Used after a character or (...)
ab{1,2}c
abc,abbc
Give an example:
import re #: Matched out,? String number? Words? For? Write a character, then? Yes? Write? And these? Write? Is there any? ret = re.match("[A-Z][a-z]*","M") print(ret.group()) ret = re.match("[A-Z][a-z]*","MnnM") print(ret.group()) ret = re.match("[A-Z][a-z]*","Aabcdef") print(ret.group()) #Match out whether the variable name is valid names = ["name1", "_name", "2_name", "__name__"] for name in names: ret = re.match("[a-zA-Z_]+[w]*",name) if ret: print("Variable Name %s Meet Requirements" % ret.group()) else: print("Variable Name %s ?method" % name) #Match out numbers between 0 and 99 ret = re.match("[1-9]?[0-9]","7") print(ret.group()) ret = re.match("[1-9]?d","33") print(ret.group()) # This result isn't what you want, benefit?$ Only then can it be resolved ret = re.match("[1-9]?d","09") print(ret.group()) ret = re.match("[a-zA-Z0-9_]{6}","12a3g45678") print(ret.group()) #Match out a password of 8 to 20 bits, can it be?? Write in English? Words?, Numbers, Underlines ret = re.match("[a-zA-Z0-9_]{8,20}","1ad12f23s34455ff66") print(ret.group())
Result:
M
Mnn
Aabcdef
Variable name name1 meets requirements
Variable name_ name meets requirements
Variable Name 2_name method
Variable name_u name_u Meet Requirements
7
33
0
12a3g4
1ad12f23s34455ff66
Match start and end
character
function
^
Beginning of Match String
$
Match end of string
Example: Match 163.com's mailbox address
import re email_list = ["xiaoWang@163.com", "xiaoWang@163.comheihei", ".com.xiaowang@qq.com"] for email in email_list: ret = re.match("[w]{4,20}@163.com$", email) if ret: print("%s Is the required email address,The result of the match is:%s" % (email, ret.group())) else: print("%s Not meeting requirements" % email)
Result:
xiaoWang@163.com Is the specified email address, the result of matching is: xiaoWang@163.com
xiaoWang@163.comheihei Not meeting requirements
.com.xiaowang@qq.com Not meeting requirements
Match Grouping
character
function
|
Match any left or right expression
(ab)
Grouping characters in parentheses
um
The string to which the quotation grouping num matches
(P)
Grouping is aliased and matched substring groups are externally obtained by a defined name
(P=name)
The string to which the alias name grouping matches
Example: |
#Match numbers between 0 and 100 import re ret = re.match("[1-9]?d$|100","8") print(ret.group()) # 8 ret = re.match("[1-9]?d$|100","78") print(ret.group()) # 78 ret = re.match("[1-9]?d$|100","08") # print(ret.group()) # Not between 0-100 ret = re.match("[1-9]?d$|100","100") print(ret.group()) # 100
Example: ()
#Requirements: match 163, 126, qq mailboxes ret = re.match("w{4,20}@163.com", "test@163.com") print(ret.group()) # test@163.com ret = re.match("w{4,20}@(163|126|qq).com", "test@126.com") print(ret.group()) # test@126.com ret = re.match("w{4,20}@(163|126|qq).com", "test@qq.com") print(ret.group()) # test@qq.com ret = re.match("w{4,20}@(163|126|qq).com", "test@gmail.com") if ret: print(ret.group()) else: print("Not 163, 126, qq mailbox") # Not 163, 126, qq mailbox #Don't end with 4 or 7? Machine number (11 digits) tels = ["13100001234", "18912344321", "10086", "18800007777"] for tel in tels: ret = re.match("1d{9}[0-35-68-9]", tel) if ret: print(ret.group()) else: print("%s Not what you want?Airline Number" % tel) #Extract area code and phone number ret = re.match("([^-]*)-(d+)","010-12345678") print(ret.group()) print(ret.group(1)) print(ret.group(2))
Example: umber
Matches the combination of numeric representations. Each bracket is a combination numbered from the beginning. For example (. +) matches'the'or'55', but does not match'the' (note the space after the combination). This particular sequence can only be used to match the first 99 combinations. If the first digit of a number is 0, or if the number is three octal digits, it will not be considered a combination, but an octal numeric value. Within the'['and']' character set, any numeric escape is considered a character.
Example 1: Match out <html>hh</html>
,...,9, match the content of the nth grouping. As an example, refers to matching the contents of the first group.
import re # The right way to think about it: What if in the first place? What is in <>, reasonably later? That pair of <> should be what. By quotation? The matched data in the grouping is fine, but note that it is a metastring, a format similar to r''. ret = re.match(r"<([a-zA-Z]*)>w*</>", "<html>hh</html>") # Because 2 pairs of data in <>are not? To, so it doesn't match test_label = ["<html>hh</html>","<html>hh</htmlbalabala>"] for label in test_label: ret = re.match(r"<([a-zA-Z]*)>w*</>", label) if ret: print("%s This is the right label pair" % ret.group()) else: print("%s This is?Incorrect label" % label)
Result:
hh This is the right label hh This is the wrong labelExample 2: Matched out
www.itcast.cn
import re labels = ["<html><h1>www.itcast.cn</h1></html>", "<html><h1>www.itcast.cn</h2></html>"] for label in labels: ret = re.match(r"<(w*)><(w*)>.*</></>", label) if ret: print("%s Is a label that meets the requirements" % ret.group()) else: print("%s Not meeting requirements" % label)
Result:
www.itcast.cn
Is a label that meets the requirementswww.itcast.cn
Not meeting requirementsExample: (P) (P=name)
One for markup and one for reuse in the same regular expression
import re ret = re.match(r"<(?P<name1>w*)><(?P<name2>w*)>.*</(?P=name2)></(?P=name1)>","<html><h1>www.itcast.cn</h1></html>") ret.group() ret = re.match(r"<(?P<name1>w*)><(?P<name2>w*)>.*</(?P=name2)></(?P=name1)>","<html><h1>www.itcast.cn</h2></html>") #ret.group()
re.compile function
The compile function compiles a regular expression and generates a Pattern object for use by the match() and search() functions.
prog = re.compile(pattern) result = prog.match(string)
Equivalent to
result = re.match(pattern, string)
Give an example:
>>>import re >>> pattern = re.compile(r'd+') m = pattern.match('one12twothree34four', 3, 10) # Match from the position of'1', just match >>> print m # Return a Match object <_sre.SRE_Match object at 0x10a42aac0> >>> m.group(0) # Omit 0 '12' >>> m.start(0) # Omit 0 3 >>> m.end(0) # Omit 0 5 >>> m.span(0) # Omit 0 (3, 5)
Above, when the match succeeds, a Match object is returned, where:
- The group([group1,...]) method is used to obtain one or more grouped matching strings, and group() or group(0) can be used directly when the entire matching substring is to be obtained;
- The start([group]) method is used to get the starting position (index of the first character of the substring) of the grouping match throughout the string, and the default value of the parameter is 0.
- The end([group]) method is used to get the end position of the grouped matching substring in the entire string (index + 1 of the last character of the substring), with a default parameter of 0;
- The span([group]) method returns (start(group), end(group))
re.search function
re.search scans the entire string and returns the first successful match, or a None if no match exists.
re.match and re. The difference between search: re.match only matches the beginning of the string. If the beginning of the string does not match the regular expression, the match fails and the function returns None. And re.search matches the entire string until a match is found
Give an example:
import re ret = re.search(r"d+", "9999 reads") print(ret.group())
Result:
9999
re.findall function
Finds all the substrings matched by the regular expression in the string and returns a list, or an empty list if no match is found. Note that **:** match and search are matches once findall matches all.
Give an example:
import re ret = re.findall(r"d+", "python = 9999, c = 7890, c++ = 12345") print(ret)
Result:
['9999', '7890', '12345']
re.finditer function
Similar to findall, all substrings matching the regular expression are found in the string and returned as an iterator.
import re it = re.finditer(r"d+", "12a32bc43jf3") for match in it: print(match.group())
Result:
12
32
43
3
Re. Subfunction
Subis written by sub stitute to represent replacement and to replace the matched data.
Syntax: re.sub(pattern, repl, string, count=0, flags=0)
parameter
describe
pattern
Required, representing the pattern string in the regular
repl
Required, replacement, string to replace, or a function
string
Required, the string to be replaced
count
Optional parameter, count is the maximum number of times to be replaced, must be a non-negative integer. If this parameter is omitted or set to 0, all matches will be replaced
flag
Optional parameters, flag bits, to control how regular expressions are matched, such as case sensitivity, multiline matching, and so on.
Example: Add 1 to the number of reading matches
Method 1:
import re ret = re.sub(r"d+", '998', "python = 997") print(ret)
Result: python = 998
Method 2:
import re def add(temp): #The int() parameter must be a string, byte-like object or number, not "re.Match" strNum = temp.group() num = int(strNum) + 1 return str(num) ret = re.sub(r"d+", add, "python = 997") print(ret) ret = re.sub(r"d+", add, "python = 99") print(ret)
Result;
python = 998
python = 100
re.subn function
The behavior is the same as sub(), but returns a tuple (string, number of substitutions).
re.subn(pattern, repl, string[, count])
Return: (sub(repl, string[, count]), number of substitutions)
import re pattern = re.compile(r'(w+) (w+)') s = 'i say, hello world!' print(re.subn(pattern, r' ', s)) def func(m): return m.group(1).title() + ' ' + m.group(2).title() print(re.subn(pattern, func, s)) ### output ### # ('say i, world hello!', 2) # ('I Say, Hello World!', 2)
re.split function
Cut the string according to the match and return a list.
re.``split(pattern, string, maxsplit=0, flags=0)
parameter
describe
pattern
Matching Regular Expressions
string
String to match
maxsplit
Number of delimitations, maxsplit=1 delimit once, default is 0, unlimited number of times
Give an example:
import re ret = re.split(r":| ","info:xiaoZhang 33 shandong") print(ret)
Results: ['info','xiaoZhang','33','shandong']
Pthon Greed and Greed
Python quantifiers are greedy by default (or in minority languages, by default), always trying to match as many characters as possible; Greedy, on the other hand, always tries to match as few characters as possible.
For example, the regular expression "ab*" will find "abbb" if it is used to find "abbbc". If you use the non-greedy quantifier "ab*", you will find "a".
Note: We usually use non-greedy mode to extract.
Add after'*','+', {m,n}'? Turn greed into greed.
Example 1:
import re s="This is a number 234-235-22-423" #In regular expression pattern? To wildcards, it will try to "grab" full when evaluated in left-to-right order? Best match? String, on us? Example???? Will'. +'grab its full length from the beginning of the string? Top of the pattern? Characters, including the ones we want? Of the integer fields? Part,'d+'only? Bit characters match, so it matches the number "4"? ". +" matches from the beginning of the string to this? All characters before digit 4 r=re.match(".+(d+-d+-d+-d+)",s) print(r.group(1)) #What? The greedy operator'?'. Is this operator OK? In'*','+','? After?, The fewer regular matches required, the better r=re.match(".+?(d+-d+-d+-d+)",s) print(r.group(1))
Result:
4-235-22-423
234-235-22-423
Example 2:
>>> re.match(r"aa(d+)","aa2343ddd").group(1) '2343' >>> re.match(r"aa(d+?)","aa2343ddd").group(1) '2' >>> re.match(r"aa(d+)ddd","aa2343ddd").group(1) '2343' >>> re.match(r"aa(d+?)ddd","aa2343ddd").group(1) '2343'
Example 3: Extracting picture addresses
import re test_str="<img data-original=https://rpic.douyucdn.cn/appCovers/2016/11/13/1213973.jpg>" ret = re.search(r"https://.*?.jpg", test_str) print(ret.group())
Result: https://rpic.douyucdn.cn/appCovers/2016/11/13/1213973.jpg
r's work
Like most programming languages, regular expressions use "" as an escape character, which can cause backslash problems. If you need to match the characters in the text ", you will need four backslashes"\"in a regular expression expressed in a programming language: the first two and the last two are used to escape to backslashes in a programming language, convert to two backslashes and then to a backslash in a regular expression. Native strings in Python solve this problem well, and in Python strings are preceded by r to denote the original string.
import re mm = "c:\a\b\c" print(mm)#c:ac ret = re.match("c:\\",mm).group() print(ret)#c:
ret = re.match("c:\a",mm).group()
print(ret)#c:a
ret = re.match(r"c:\a",mm).group()
print(ret)#c:a
ret = re.match(r"c:a",mm).group()
print(ret)#AttributeError: 'NoneType' object has no attribute 'group'