MySQL uses regular expressions

Posted by ihsreal on Wed, 29 Dec 2021 10:22:42 +0100

Regular expressions are mainly used to query and replace text content that conforms to a certain pattern (rule). They are powerful and flexible. They are often used in very complex queries. Compared with LIKE, the matching rules are very comprehensive.

In MySQL, use the REGEXP keyword to specify the character matching pattern of regular expressions. The basic syntax format is as follows:

Attribute name REGEXP 'Matching mode'

Where, "attribute name" indicates the name of the field to be queried; Match method indicates how to match the query. There are many pattern matching characters in "matching method", which represent different meanings. The following table lists the matching methods commonly used in REGEXP operators.

optionexplainexampleMatching value example
^Matches the start character of the text'^ b' matches a string starting with the letter bbook,big,banana,bike
$Matches the end character of the text'st $'matches a string ending in Sttest,resist,persist
.Match any single character'b.t' matches any b and t with a character between thembit,bat,but,bite
*Matches zero or more characters before it'f*n' matches any character f before character nfn,fan,faan,abcn
+Matches the preceding character 1 or more times'ba +' matches start with b, followed by at least one aba,bay,bare,battle
< string >Matches text containing the specified character'fa' matches text containing 'fa'fan,afa,faad
[character set]Matches any character in the character set'[xz]' matches x or zdizzy,zebra,x-ray,extra
[^]Matches any character that is not in parentheses'[^ abc]' matches any string that does not contain a, b, or cdesk,fox,f8ke
String {n,}Match the previous string at least n times'b{2}' matches 2 or more b'sbbb,bbbb,bbbbbbb
String {n,m}Match the previous string at least n times and at most m times'b{2,4}' matches at least 2 and at most 4 b'sbbb,bbbb

Regular expressions in MySQL are basically consistent with those in programming languages such as Java language and PHP language.

Queries records that begin with a specific character or string

The character ^ is used to match records that begin with a specific character or string.

In TB_ students_ In the info table, query the records whose name field starts with "J". The SQL statement and execution process are as follows.

mysql> SELECT * FROM tb_students_info 
    -> WHERE name REGEXP '^J';
+----+------+------+------+--------+-----------+
| id | name | age  | sex  | height | course_id |
+----+------+------+------+--------+-----------+
|  4 | Jane |   22 | male   |    162 |         3 |
|  5 | Jim  |   24 | female   |    175 |         2 |
|  6 | John |   21 | female   |    172 |         4 |
+----+------+------+------+--------+-----------+
3 rows in set (0.01 sec)

In TB_ students_ In the info table, query the records whose name field starts with "Ji". The SQL statement and execution process are as follows.

mysql> SELECT * FROM tb_students_info
    -> WHERE name REGEXP '^Ji';
+----+------+------+------+--------+-----------+
| id | name | age  | sex  | height | course_id |
+----+------+------+------+--------+-----------+
|  5 | Jim  |   24 | female   |    175 |         2 |
+----+------+------+------+--------+-----------+
1 row in set (0.00 sec)

Queries records that end with a specific character or string

The character $is used to match records that end with a specific character or string.

In TB_ students_ In the info table, query the records whose name field ends with "y". The SQL statement and execution process are as follows.

mysql> SELECT * FROM tb_students_info
    -> WHERE name REGEXP 'y$';
+----+-------+------+------+--------+-----------+
| id | name  | age  | sex  | height | course_id |
+----+-------+------+------+--------+-----------+
|  1 | Dany  |   25 | male   |    160 |         1 |
|  3 | Henry |   23 | female   |    185 |         1 |
|  7 | Lily  |   22 | male   |    165 |         4 |
+----+-------+------+------+--------+-----------+
3 rows in set (0.00 sec)

In TB_ students_ In the info table, query the records whose name field ends with "ry". The SQL statement and execution process are as follows.

mysql> SELECT * FROM tb_students_info
    -> WHERE name REGEXP 'ry$';
+----+-------+------+------+--------+-----------+
| id | name  | age  | sex  | height | course_id |
+----+-------+------+------+--------+-----------+
|  3 | Henry |   23 | female   |    185 |         1 |
+----+-------+------+------+--------+-----------+
1 row in set (0.00 sec)

Replace any character in the string

Character Used to replace any character in the string.

In TB_ students_ In the info table, the query name field contains "a" and "y", and there is only one letter between the two letters. The SQL statement and execution process are as follows.

mysql> SELECT * FROM tb_students_info
    -> WHERE name REGEXP 'a.y';
+----+------+------+------+--------+-----------+
| id | name | age  | sex  | height | course_id |
+----+------+------+------+--------+-----------+
|  1 | Dany |   25 | male   |    160 |         1 |
+----+------+------+------+--------+-----------+
1 row in set (0.00 sec)

Match multiple characters

The characters * and + can match multiple characters before the symbol. The difference is that + represents at least one character, while * can represent 0 characters.

In TB_ students_ In the info table, the query name field value contains the letter "T", and the letter "h" appears after "T". The SQL statement and execution process are as follows.

mysql> SELECT * FROM tb_students_info
    -> WHERE name REGEXP '^Th*';
+----+--------+------+------+--------+-----------+
| id | name   | age  | sex  | height | course_id |
+----+--------+------+------+--------+-----------+
|  9 | Thomas |   22 | female   |    178 |         5 |
| 10 | Tom    |   23 | female   |    165 |         5 |
+----+--------+------+------+--------+-----------+
2 rows in set (0.00 sec)

In TB_ students_ In the info table, the value of the query name field contains the letter "T", and "h" appears at least once after "T". The SQL statement and execution process are as follows.

mysql> SELECT * FROM tb_students_info
    -> WHERE name REGEXP '^Th+';
+----+--------+------+------+--------+-----------+
| id | name   | age  | sex  | height | course_id |
+----+--------+------+------+--------+-----------+
|  9 | Thomas |   22 | female   |    178 |         5 |
+----+--------+------+------+--------+-----------+
1 row in set (0.00 sec)

Matches the specified string

Regular expressions can match strings. When the record in the table contains this string, you can query the record. When specifying multiple strings, you need to separate them with |. Just match any of these strings.

In TB_ students_ In the info table, query the record whose name field value contains the string "an". The SQL statement and execution process are as follows.

mysql> SELECT * FROM tb_students_info
    -> WHERE name REGEXP 'an';
+----+-------+------+------+--------+-----------+
| id | name  | age  | sex  | height | course_id |
+----+-------+------+------+--------+-----------+
|  1 | Dany  |   25 | male   |    160 |         1 |
|  4 | Jane  |   22 | male   |    162 |         3 |
|  8 | Susan |   23 | male   |    170 |         5 |
+----+-------+------+------+--------+-----------+
3 rows in set (0.00 sec)

In TB_ students_ In the info table, query the record whose name field value contains the string "an" or "en". The SQL statement and execution process are as follows.

mysql> SELECT * FROM tb_students_info
    -> WHERE name REGEXP 'an|en';
+----+-------+------+------+--------+-----------+
| id | name  | age  | sex  | height | course_id |
+----+-------+------+------+--------+-----------+
|  1 | Dany  |   25 | male   |    160 |         1 |
|  2 | Green |   23 | male   |    158 |         2 |
|  3 | Henry |   23 | female   |    185 |         1 |
|  4 | Jane  |   22 | male   |    162 |         3 |
|  8 | Susan |   23 | male   |    170 |         5 |
+----+-------+------+------+--------+-----------+
5 rows in set (0.00 sec)

Note: there must be no spaces between the string and |. Because, in the query process, the database system will also treat the space as a character, so it can't find the desired result.

Matches any one of the specified strings

Use square brackets [] to form the characters to be queried into a character set. As long as the record contains any character in square brackets, the record will be queried. For example, you can query records containing any one of the three letters a, b, and c through "[abc]".

In TB_ students_ In the info table, query the records whose name field value contains the letter "i" or "o". The SQL statement and execution process are as follows.

mysql> SELECT * FROM tb_students_info
    -> WHERE name REGEXP '[io]';
+----+--------+------+------+--------+-----------+
| id | name   | age  | sex  | height | course_id |
+----+--------+------+------+--------+-----------+
|  5 | Jim    |   24 | female   |    175 |         2 |
|  6 | John   |   21 | female   |    172 |         4 |
|  7 | Lily   |   22 | male   |    165 |         4 |
|  9 | Thomas |   22 | female   |    178 |         5 |
| 10 | Tom    |   23 | female   |    165 |         5 |
| 11 | LiMing |   22 | male   |    180 |         7 |
+----+--------+------+------+--------+-----------+
6 rows in set (0.00 sec)

You can see from the query results that the name field values of all returned records contain the letters i or o, or both.

Square brackets [] can also specify the interval of the set. For example, "[A-Z]" represents all letters from a to Z; "[0-9]" indicates all numbers from 0 to 9; "[a-z0-9]" means including all lowercase letters and numbers; "[a-zA-Z]" means that all characters are matched.

In TB_ students_ In the info table, the query name field contains records of 1, 2 or 3. The SQL statement and execution process are as follows.

mysql> SELECT * FROM tb_students_info
    -> WHERE name REGEXP '[123]';
Empty set (0.00 sec)

The matching set "[123]" can also be written as "[1-3]", that is, specify the set interval.

Matches characters other than those specified

[^ character set] is used to match any character that is not in the specified set.

In TB_ students_ In the info table, query the records whose name field value contains characters other than letters a~t. the SQL statement and execution process are as follows.

mysql> SELECT * FROM tb_students_info
    -> WHERE name REGEXP '[^a-t]' ;
+----+-------+------+------+--------+-----------+
| id | name  | age  | sex  | height | course_id |
+----+-------+------+------+--------+-----------+
|  1 | Dany  |   25 | male   |    160 |         1 |
|  3 | Henry |   23 | female   |    185 |         1 |
|  7 | Lily  |   22 | male   |    165 |         4 |
|  8 | Susan |   23 | male   |    170 |         5 |
+----+-------+------+------+--------+-----------+
4 rows in set (0.00 sec)

Use {n,} or {n,m} to specify the number of consecutive occurrences of a string

String {n,} indicates that the string appears n times continuously; The string {n,m} indicates that the string appears continuously at least N times and at most m times.

For example, a{2,} indicates that the letter a appears continuously at least twice, or more than twice; a{2,4} indicates that the letter a appears continuously at least twice and no more than four times.

In TB_ students_ In the info table, query the record where the letter 'e' appears at least twice in the name field value. The SQL statement is as follows:

mysql> SELECT * FROM tb_students_info WHERE name REGEXP 'e{2,}';
+----+-------+------+------+--------+-----------+
| id | name  | age  | sex  | height | course_id |
+----+-------+------+------+--------+-----------+
|  2 | Green |   23 | male   |    158 |         2 |
+----+-------+------+------+--------+-----------+
1 row in set (0.00 sec)

In TB_ students_ In the info table, query the records where the string "i" appears at least once and at most three times in the name field value. The SQL statement is as follows:

mysql> SELECT * FROM tb_students_info WHERE name REGEXP 'i{1,3}';
+----+--------+------+------+--------+-----------+
| id | name   | age  | sex  | height | course_id |
+----+--------+------+------+--------+-----------+
|  5 | Jim    |   24 | female   |    175 |         2 |
|  7 | Lily   |   22 | male   |    165 |         4 |
| 11 | LiMing |   22 | male   |    180 |         7 |
+----+--------+------+------+--------+-----------+
3 rows in set (0.00 sec)

summary

Pay special attention to the command of parentheses (). If () is not used, all of them refer to a single character for matching. If multiple characters need to be used as a whole for matching, multiple characters need to be enclosed with ()

1. Use REGEXP and NOT REGEXP operators (similar to LIKE and NOT LIKE);

2.REGEXP is also case insensitive by default. You can use binary keyword to force case sensitivity; WHERE NAME REGEXP BINARY ‘[A-Z]’;

3.REGEXP defaults to the partial matching principle, that is, if there is a match, it returns true, if there is a match, it returns "1" and if there is no match, it returns "0". For example: SELECT 'Abc' REGEXP BINARY '[A-Z]';

Topics: Front-end MySQL regex