1. Regular expression 30 minute introductory tutorial
Definition: regular expression is a formula for filtering strings that meet rules. It can be used to find strings that meet some complex rules.
For example, when setting a password, you can set a regular expression to match the specified "need to contain letters and numbers", so as to prevent users from setting a password that does not meet the regulations.
Online regular expression testing tool:
*Common metacharacters
* common qualifiers
*Character escape
eg: \\ = \
*Character class
[aeiou] find characters with aeiou
[a-z0-9] find letters of A-Z or numbers of 0-9
*Branching condition
Symbol: pre condition | post condition
Judgment method: match from left to right, and stop if successful
*Grouping
(regular expression) {times} (regular expression)
*Common antisense code
*Backward reference and zero width assertion
* greed and laziness
#Regular expression matching usually matches as many characters as possible on the premise of meeting the conditions.
* processing options
(. Net environment)
* balanced group / recursive matching
(deal with the unequal number of brackets around)
Using the idea of stack.
* other syntax
Reference documents
2. Re module for regular expression processing in Python
Where flags is the processing option:
Example 1: verify whether the input user name and QQ number are valid and give the corresponding prompt information.
''' Verify the user name and password entered QQ Whether the number is valid and gives the corresponding prompt information Requirement: the user name must consist of letters, numbers or underscores and be 6 in length~20 Between characters, QQ The number is 5~12 And the first digit cannot be 0 ''' import re def main(): username = input('Enter user name:') qq = input('Please enter QQ number:') # The first argument to the match function is a regular expression string or a regular expression object # The second parameter is the string object to match the regular expression m1 = re.match(r'^[0-9a-zA-Z_]{6,20}$', username) if not m1: print('Please enter a valid Username ') m2 = re.match(r'^[1-9]\d{4-11}$', qq) if not m2: print('Please enter a valid QQ number') if m1 and m2: print('The information you entered is valid!') if __name__ == '__main__': main()
Example 2: extract the domestic mobile phone number from a text.
import re def main(): # Look ahead and review are used to create regular expression objects to ensure that numbers should not appear before and after the mobile phone number pattern = re.compile(r'(?<=\D)1[34578]\d{9}(?=\D)') sentence = ''' Say important things 8130123456789 times. My mobile phone number is 13512346789, Not 15600998765, but also 110 or 119. Wang dachui's mobile phone number is 15600998765. ''' # Find all matches and save to a list mylist = re.findall(pattern, sentence) print(mylist) print('--------Gorgeous dividing line--------') # Take out the matching object through the iterator and get the matching content for temp in pattern.finditer(sentence): print(temp.group()) print('--------Gorgeous dividing line--------') # Specify the search location through the search function to find all matches m = pattern.search(sentence) while m: print(m.group()) m = pattern.search(sentence, m.end()) if __name__ == '__main__': main()
ex: the group() function is used to propose the string intercepted by the group
Example 3: replace bad content in string
import re def main(): sentence = 'Are you a silly fork? Fuck you. Fuck you.' purified = re.sub('[Fuck]|fuck|shit|silly[Bi Bi forced the fork to lack the hanging cock]|A sharp brush', '*', sentence, flags=re.IGNORECASE) print(purified) # Are you a *? I * your uncle's* you. if __name__ == '__main__': main()
Example 4: splitting long strings
import re def main(): poem = 'The bright moon in front of the window is suspected to be frost on the ground. Raising my head, I see the moon so bright; withdrawing my eyes, my nostalgia comes around.' sentence_list = re.split(r'[,. , .]', poem) while '' in sentence_list: sentence_list.remove('') print(sentence_list) # ['moonlight in front of the window', 'suspected frost on the ground', 'raise your head to look at the moon', 'bow your head and think of your hometown'] if __name__ == '__main__': main()
ex:Beautiful Soup or Lxml As a crawler, the library of matching strings is also very easy to use.