A quick start to regular expressions and the use of Python re modules

Posted by Unholy Prayer on Mon, 13 Dec 2021 10:17:41 +0100

1. Regular expression 30 minute introductory tutorial

Definition: regular expression is a formula for filtering strings that meet rules. It can be used to find strings that meet some complex rules.

For example, when setting a password, you can set a regular expression to match the specified "need to contain letters and numbers", so as to prevent users from setting a password that does not meet the regulations.

Link: Regular expression 30 minute introductory tutorial 30 minutes to let you understand what regular expression is, and have some basic understanding of it, so that you can use it in your own program or web page.https://deerchao.cn/tutorials/regex/regex.htm

Online regular expression testing tool:

Regular expression online test | rookie tool 1. Check the expression number of the number: ^ [0-9] * $n digits: ^ \ d{n} $at least n digits: ^ \ d{n,} $M-N digits: ^ \ d{m,n} $zero and non-zero digits: ^ (0 | [1-9] [0-9] *) $non-zero digits with up to two decimal places: ^ ([1-9] [0-9] *) + (\. [0-9] {1,2})$ Positive or negative numbers with 1-2 decimal places: ^ (\ -)? \d+(\.\d{1,2}) $positive, negative, and decimal: ^ (\ - | \ +)? \d+(\.\..https://c.runoob.com/front-end/854/

*Common metacharacters

* common qualifiers

*Character escape

eg: \\ = \

*Character class

[aeiou] find characters with aeiou

[a-z0-9] find letters of A-Z or numbers of 0-9

*Branching condition

Symbol: pre condition | post condition

Judgment method: match from left to right, and stop if successful

*Grouping

(regular expression) {times} (regular expression)

*Common antisense code

*Backward reference and zero width assertion

* greed and laziness

#Regular expression matching usually matches as many characters as possible on the premise of meeting the conditions.

* processing options

(. Net environment)

* balanced group / recursive matching

(deal with the unequal number of brackets around)

Using the idea of stack.

* other syntax

Reference documents

Regular expression language -- quick referencehttps://docs.microsoft.com/zh-cn/dotnet/standard/base-types/regular-expression-language-quick-reference?redirectedfrom=MSDN

2. Re module for regular expression processing in Python

 

Where flags is the processing option:

 

Example 1: verify whether the input user name and QQ number are valid and give the corresponding prompt information.

'''
Verify the user name and password entered QQ Whether the number is valid and gives the corresponding prompt information

Requirement: the user name must consist of letters, numbers or underscores and be 6 in length~20 Between characters, QQ The number is 5~12 And the first digit cannot be 0
'''
import re

def main():
    username = input('Enter user name:')
    qq = input('Please enter QQ number:')
    # The first argument to the match function is a regular expression string or a regular expression object
    # The second parameter is the string object to match the regular expression
    m1 = re.match(r'^[0-9a-zA-Z_]{6,20}$', username)
    if not m1:
        print('Please enter a valid Username ')
    m2 = re.match(r'^[1-9]\d{4-11}$', qq)
    if not m2:
        print('Please enter a valid QQ number')
    if m1 and m2:
        print('The information you entered is valid!')

if __name__ == '__main__':
    main()

Example 2: extract the domestic mobile phone number from a text.

 

import re


def main():
    # Look ahead and review are used to create regular expression objects to ensure that numbers should not appear before and after the mobile phone number
    pattern = re.compile(r'(?<=\D)1[34578]\d{9}(?=\D)')
    sentence = '''
    Say important things 8130123456789 times. My mobile phone number is 13512346789,
    Not 15600998765, but also 110 or 119. Wang dachui's mobile phone number is 15600998765.
    '''
    # Find all matches and save to a list
    mylist = re.findall(pattern, sentence)
    print(mylist)
    print('--------Gorgeous dividing line--------')
    # Take out the matching object through the iterator and get the matching content
    for temp in pattern.finditer(sentence):
        print(temp.group())
    print('--------Gorgeous dividing line--------')
    # Specify the search location through the search function to find all matches
    m = pattern.search(sentence)
    while m:
        print(m.group())
        m = pattern.search(sentence, m.end())


if __name__ == '__main__':
    main()

ex: the group() function is used to propose the string intercepted by the group

Example 3: replace bad content in string

import re


def main():
    sentence = 'Are you a silly fork? Fuck you. Fuck you.'
    purified = re.sub('[Fuck]|fuck|shit|silly[Bi Bi forced the fork to lack the hanging cock]|A sharp brush',
                      '*', sentence, flags=re.IGNORECASE)
    print(purified)  # Are you a *? I * your uncle's* you.


if __name__ == '__main__':
    main()

Example 4: splitting long strings

import re


def main():
    poem = 'The bright moon in front of the window is suspected to be frost on the ground. Raising my head, I see the moon so bright; withdrawing my eyes, my nostalgia comes around.'
    sentence_list = re.split(r'[,. , .]', poem)
    while '' in sentence_list:
        sentence_list.remove('')
    print(sentence_list)  # ['moonlight in front of the window', 'suspected frost on the ground', 'raise your head to look at the moon', 'bow your head and think of your hometown']


if __name__ == '__main__':
    main()

ex:Beautiful Soup or Lxml As a crawler, the library of matching strings is also very easy to use.

Topics: Python regex