On regular expressions, this one is enough!! (including complete cases, recommended Collection)

Posted by fleymingmasc on Thu, 10 Feb 2022 23:03:18 +0100

Master these regular expressions and be able to write a lot less code. Glacier strongly recommends that you collect them!!

Hello, I'm glacier~~

It is by mastering these regular expressions that glacier writes an average of 200 lines of code less than others every day, which greatly improves the efficiency of R & D.

Mastering regular expressions can help programmers write the most elegant code at the fastest speed. Glacier has combed and summarized the regular expressions used in programming for many years. These regular expressions can help you save a lot of coding time. Often a simple regular expression can omit a lot of if else... code. This time, glacier disclosed the regular expressions they often use to the little friends, hoping to bring substantive help to the little friends.

Common glacier regularities

Integer or decimal

^[0-9]+\.{0,1}[0-9]{0,2}$ 

Only numbers can be entered

^[0-9]*$

Only n digits can be entered

^\d{n}$

You can only enter at least n digits

^\d{n,}$

Only m~n digits can be entered

^\d{m,n}$ 

You can only enter numbers starting with zero and non-zero

^(0|[1-9][0-9]*)$

You can only enter positive real numbers with two decimal places

^[0-9]+(.[0-9]{2})?$

You can only enter positive real numbers with 1 ~ 3 decimal places

^[0-9]+(.[0-9]{1,3})?$

Only non-zero integers can be entered

^\+?[1-9][0-9]*$

Only non-zero negative integers can be entered

^\-[1-9][]0-9*$

Only characters of length 3 can be entered

^.{3}$

You can only enter a string consisting of 26 English letters

^[A-Za-z]+$

You can only enter a string consisting of 26 uppercase English letters

^[A-Z]+$

You can only enter a string consisting of 26 lowercase English letters

^[a-z]+$

You can only enter a string consisting of numbers and 26 English letters

^[A-Za-z0-9]+$

You can only enter a string consisting of numbers, 26 English letters or underscores

^\w+$

Verify user password:

^[a-zA-Z]\w{5,17}$ 

Note: the correct format is: start with a letter, the length is between 6 and 18, and can only contain characters, numbers and underscores.

Verify whether it contains ^% & '; =$\ Equal character

[^%&',;=?$\x22]+ 

Only Chinese characters can be entered

^[\u4e00-\u9fa5]{0,}$ 

Verify Email address

^\w+([-+.]\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*$

Verify Internet URL

^[http|https]://([\w-]+\.)+[\w-]+(/[\w-./?%&=]*)?$

Verify phone number

^(\(\d{3,4}-)|\d{3.4}-)?\d{7,8}$

The correct format is: XXX-XXXXXXX, XXXX- XXXXXXXX, XXX-XXXXXXX, XXX-XXXXXXXX, XXXXXXX and XXXXXXXX

Verify ID number (15 digit or 18 digit).

^\d{15}|\d{18}$

12 months of validation year

^(0?[1-9]|1[0-2])$

The correct format is: 01 ~ 09 and 1 ~ 12

31 days of validation month

^((0?[1-9])|((1|2)[0-9])|30|31)$

The correct format is; 01 ~ 09 and 1 ~ 31

Regular expressions that match Chinese characters

[\u4e00-\u9fa5]

Match double byte characters (including Chinese characters)

[^\x00-\xff] 

Regular expressions that match empty lines

\n[\s| ]*\r

Regular expressions that match html tags

<(.*)>(.*)<\/(.*)>|<(.*)\/>

Regular expressions that match leading and trailing spaces

(^\s*)|(\s*$)

Regular expression matching Email address

\w+([-+.]\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*

Regular expressions that match HTML tags

<(\S*?)[^>]*>.*?|<.*? />

Comment: the version circulated on the Internet is too bad. The above one can only match the part. There is still nothing to do with complex nested tags

Regular expressions that match leading and trailing white space characters

^\s*|\s*$

Comment: it can be used to delete white space characters at the beginning and end of a line (including spaces, tabs, page breaks, etc.), which is a very useful expression

Regular expression matching Email address

\w+([-+.]\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*

Comment: form validation is very practical

Regular expression matching URL of web address

[a-zA-z]+://[^\s]*

Commentary: the functions of the version circulated on the Internet are very limited, and the above can basically meet the needs

Whether the matching account number is legal (starting with a letter, 5-16 bytes are allowed, and alphanumeric underscores are allowed)

^[a-zA-Z][a-zA-Z0-9_]{4,15}$

Comment: form validation is very practical

Match domestic phone number

\d{3}-\d{8}|\d{4}-\d{7}

Comment: the matching form is 0511-4405222 or 021-87888822

Match Tencent QQ number

[1-9][0-9]{4,}

Comment: Tencent QQ starts from 10000

Match China postal code

[1-9]\d{5}(?!\d)

Commentary: the postal code of China is 6 digits

Matching ID card

\d{15}|\d{18}

Commentary: China has 15 or 18 ID cards

Match ip address

\d+\.\d+\.\d+\.\d+

Comment: useful when extracting ip addresses

Match specific numbers

^[1-9]\d*$ //Match positive integer
^-[1-9]\d*$ //Match negative integer
^-?[1-9]\d*$ //Match integer
^[1-9]\d*|0$ //Match non negative integer (positive integer + 0)
^-[1-9]\d*|0$ //Match non positive integers (negative integers + 0)
^[1-9]\d*\.\d*|0\.\d*[1-9]\d*$ //Match positive floating point number
^-([1-9]\d*\.\d*|0\.\d*[1-9]\d*)$ //Match negative floating point number
^-?([1-9]\d*\.\d*|0\.\d*[1-9]\d*|0?\.0+|0)$ //Match floating point number
^[1-9]\d*\.\d*|0\.\d*[1-9]\d*|0?\.0+|0$ //Match non negative floating point number (positive floating point number + 0)
^(-([1-9]\d*\.\d*|0\.\d*[1-9]\d*))|0?\.0+|0$//Match non positive floating point number (negative floating point number + 0) s

Commentary: it is useful when dealing with a large amount of data. Pay attention to correction in specific application.

Match a specific string

^[A-Za-z]+$//Matches a string of 26 English letters
^[A-Z]+$//Matches a string of 26 uppercase letters
^[a-z]+$//Matches a string of 26 lowercase letters
^[A-Za-z0-9]+$//Matches a string of numbers and 26 English letters
^\w+$//Matches a string consisting of numbers, 26 English letters, or underscores

Commentary: some of the most basic and commonly used expressions

Verify password strength
For example, the strength of a password is: it contains a combination of upper and lower case letters and numbers. Special characters cannot be used, and the length is between 8-10.

^(?=.*\\d)(?=.*[a-z])(?=.*[A-Z]).{8,10}$

Check string

chinese.

^[\\u4e00-\\u9fa5]{0,}$

A string consisting of numbers, 26 English letters, or underscores

^\\w+$

Verify E-Mail address

[\\w!#$%&'*+/=?^_`{|}~-]+(?:\\.[\\w!#$%&'*+/=?^_`{|}~-]+)*@(?:[\\w](?:[\\w-]*[\\w])?\\.)+[\\w](?:[\\w-]*[\\w])?

Verify ID number
15 bits:

^[1-9]\\d{7}((0\\d)|(1[0-2]))(([0|1|2]\\d)|3[0-1])\\d{3}$

18 bits:

^[1-9]\\d{5}[1-9]\\d{3}((0\\d)|(1[0-2]))(([0|1|2]\\d)|3[0-1])\\d{3}([0-9]|X)$

Verification date
Date verification in "yyyy MM DD" format, taking into account flat leap years.

^(?:(?!0000)[0-9]{4}-(?:(?:0[1-9]|1[0-2])-(?:0[1-9]|1[0-9]|2[0-8])|(?:0[13-9]|1[0-2])-(?:29|30)|(?:0[13578]|1[02])-31)|(?:[0-9]{2}(?:0[48]|[2468][048]|[13579][26])|(?:0[48]|[2468][048]|[13579][26])00)-02-29)$

Check amount
Accurate to 2 decimal places.

^[0-9]+(.[0-9]{2})?$

Verify mobile phone number
The following is the regular expression of domestic mobile phone numbers beginning with 13, 15 and 18. (the first two digits can be extended according to the current domestic collection number)

^(13[0-9]|14[5|7]|15[0|1|2|3|5|6|7|8|9]|18[0|1|2|3|5|6|7|8|9])\\d{8}$

Judge the version of IE

^.*MSIE [5-8](?:\\.[0-9]+)?(?!.*Trident\\/[5-9]\\.0).*$

Verify IP-v4 address

\\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\b

Verify IP-v6 address

(([0-9a-fA-F]{1,4}:){7,7}[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,7}:|([0-9a-fA-F]{1,4}:){1,6}:[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,5}(:[0-9a-fA-F]{1,4}){1,2}|([0-9a-fA-F]{1,4}:){1,4}(:[0-9a-fA-F]{1,4}){1,3}|([0-9a-fA-F]{1,4}:){1,3}(:[0-9a-fA-F]{1,4}){1,4}|([0-9a-fA-F]{1,4}:){1,2}(:[0-9a-fA-F]{1,4}){1,5}|[0-9a-fA-F]{1,4}:((:[0-9a-fA-F]{1,4}){1,6})|:((:[0-9a-fA-F]{1,4}){1,7}|:)|fe80:(:[0-9a-fA-F]{0,4}){0,4}%[0-9a-zA-Z]{1,}|::(ffff(:0{1,4}){0,1}:){0,1}((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])|([0-9a-fA-F]{1,4}:){1,4}:((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9]))

Check the prefix of the URL

In application development, it is often necessary to distinguish whether the request is HTTPS or HTTP. The prefix of a url can be taken out through the following expression and then judged logically.

if (!s.match(/^[a-zA-Z]+:\\/\\//))
{
    s = 'http://' + s;
}

Extract URL link

The following expression can filter out the URL in a piece of text.

^(f|ht){1}(tp|tps):\\/\\/([\\w-]+\\.)+[\\w-]+(\\/[\\w- ./?%&=]*)?

File path and extension verification
Verify the file path and extension under windows (the. txt file in the following example)

^([a-zA-Z]\\:|\\\\)\\\\([^\\\\]+\\\\)*[^\\/:*?"<>|]+\\.txt(l)?$

Extract web page color code
Sometimes you need to extract the color code from the web page. You can use the following expression.

^#([A-Fa-f0-9]{6}|[A-Fa-f0-9]{3})$

Extract web page pictures

\\< *[img][^\\>]*[src] *= *[\\"\\']{0,1}([^\\"\\'\\ >]*)

Extract page hyperlinks

(<a\\s*(?!.*\\brel=)[^>]*)(href="https?:\\/\\/)((?!(?:(?:www\\.)?'.implode('|(?:www\\.)?', $follow_list).'))[^"]+)"((?!.*\\brel=)[^>]*)(?:[^>]*)>

Find CSS properties

^\\s*[a-zA-Z\\-]+\\s*[:]{1}\\s[a-zA-Z0-9\\s.#]+[;]{1}

Extract comments

<!--(.*?)-->

Match HTML tags

<\\/?\\w+((\\s+\\w+(\\s*=\\s*(?:".*?"|'.*?'|[\\^'">\\s]+))?)+\\s*|\\s*)\\/?>

Time regular case

Simple date judgment (YYYY/MM/DD)

^\d{4}(\-|\/|\.)\d{1,2}\1\d{1,2}$ 

Date judgment of evolution (YYYY/MM/DD| YY/MM/DD)

^(^(\d{4}|\d{2})(\-|\/|\.)\d{1,2}\3\d{1,2}$)|(^\d{4}year\d{1,2}month\d{1,2}day $)$ 

Add leap year judgment

example:

^((((1[6-9]|[2-9]\d)\d{2})-(0?[13578]|1[02])-(0?[1-9]|[12]\d|3[01]))|(((1[6-9]|[2-9]\d)\d{2})-(0?[13456789]|1[012])-(0?[1-9]|[12]\d|30))|(((1[6-9]|[2-9]\d)\d{2})-0?2-(0?[1-9]|1\d|2[0-8]))|(((1[6-9]|[2-9]\d)(0[48]|[2468][048]|[13579][26])|((16|[2468][048]|[3579][26])00))-0?2-29-))$ 

analysis:

What is the legal date range? For different application scenarios, this problem has different explanations. The agreement in MSDN is adopted here:

The DateTime value type indicates the date and time when the value ranges from 12:00:00 midnight on January 1, A.D. (Christian era) 0001 to 11:59:59 pm on December 31, A.D. (C.E.) 9999

Interpretation of leap years.

About the Gregorian calendar leap year, it is stipulated that one cycle of the earth around the sun is called a regression year, which is 365 days old at 5:48:46. Therefore, the Gregorian calendar stipulates that there are normal years and leap years. A normal year has 365 days, which is 0.2422 days shorter than the year of return, and a total of 0.9688 days shorter in four years. Therefore, one day is added every four years, and there are 366 days in this year, which is a leap year. However, the increase of one day in four years is 0.0312 days more than the four regression years, and it will be 3.12 days more after 400 years. Therefore, there are three less leap years in 400 years, that is, there are only 97 leap years in 400 years, so the average length of the Gregorian calendar year is similar to that of the regression year. Therefore, it is stipulated that a leap year must be a multiple of 400 if the year is a whole hundred. For example, 1900 and 2100 are not leap years.

First, you need to verify the year. Obviously, the year range is 0001 - 9999, and the regular expression matching YYYY is:

[0-9]{3}[1-9]|[0-9]{2}[1-9][0-9]{1}|[0-9]{1}[1-9][0-9]{2}|[1-9][0-9]{3}

Where [0-9] can also be expressed as \ d, but \ d is not as intuitive as [0-9], so I will always use [0-9] below

There are two difficulties in verifying dates with regular expressions: one is the difference in the number of days between large and small months, and the other is the consideration of leap years.

For the first difficulty, let's not consider leap years. Let's assume that February is 28 days. In this way, the month and date can be divided into three cases:

(1) The month is 1, 3, 5, 7, 8, 10, 12, and the number of days ranges from 01 to 31. The regular expression matching MM-DD is:

(0[13578]|1[02])-(0[1-9]|[12][0-9]|3[01])

(2) The month is 4, 6, 9 and 11, and the range of days is 01-30. The regular expression matching MM-DD is:

(0[469]|11)-(0[1-9]|[12][0-9]|30)

(3) The month is 2. Considering the situation of ordinary years, the regular expression matching MM-DD is:

02-(0[1-9]|[1][0-9]|2[0-8])

According to the above results, we can get the regular expression matching the weekday date format as YYYY-MM-DD:

([0-9]{3}[1-9]|[0-9]{2}[1-9][0-9]{1}|[0-9]{1}[1-9][0-9]{2}|[1-9][0-9]{3})-(((0[13578]|1[02])-(0[1-9]|[12][0-9]|3[01]))|((0[469]|11)-(0[1-9]|[12][0-9]|30))|(02-(0[1-9]|[1][0-9]|2[0-8])))

Then let's solve the second difficulty: the consideration of leap years. According to the definition of leap years, we can divide leap years into two categories:

(1) A year divisible by 4 but not 100. Looking for the change law of the last two digits, we can quickly get the following regular matching:

([0-9]{2})(0[48]|[2468][048]|[13579][26])

(2) A year divisible by 400. The number that can be divided by 400 can be divided by 100, so the last two bits must be 00. We just need to ensure that the first two bits can be divided by 4. The corresponding regular expression is:

(0[48]|[2468][048]|[3579][26])00 

The regular expression of the strongest validation date adds the validation of leap years

The date format supported by this date regular expression is as follows.

YYYY-MM-DD 
YYYY/MM/DD 
YYYY_MM_DD 
YYYY.MM.DD

The complete regular expression is as follows

((^((1[8-9]\d{2})|([2-9]\d{3}))([-\/\._])(10|12|0?[13578])([-\/\._])(3[01]|[12][0-9]|0?[1-9])$)|(^((1[8-9]\d{2})|([2-9]\d{3}))([-\/\._])(11|0?[469])([-\/\._])(30|[12][0-9]|0?[1-9])$)|(^((1[8-9]\d{2})|([2-9]\d{3}))([-\/\._])(0?2)([-\/\._])(2[0-8]|1[0-9]|0?[1-9])$)|(^([2468][048]00)([-\/\._])(0?2)([-\/\._])(29)$)|(^([3579][26]00)([-\/\._])(0?2)([-\/\._])(29)$)|(^([1][89][0][48])([-\/\._])(0?2)([-\/\._])(29)$)|(^([2-9][0-9][0][48])([-\/\._])(0?2)([-\/\._])(29)$)|(^([1][89][2468][048])([-\/\._])(0?2)([-\/\._])(29)$)|(^([2-9][0-9][2468][048])([-\/\._])(0?2)([-\/\._])(29)$)|(^([1][89][13579][26])([-\/\._])(0?2)([-\/\._])(29)$)|(^([2-9][0-9][13579][26])([-\/\._])(0?2)([-\/\._])(29)$))

The February of a leap year has 29 days, so the regular expression matching the leap year date format is YYYY-MM-DD:

(([0-9]{2})(0[48]|[2468][048]|[13579][26])|((0[48]|[2468][048]|[3579][26])00))-02-29

Finally, by combining the date verification expressions of normal and leap years, we get the regular expression with the final verification date format of YYYY-MM-DD as follows:

(([0-9]{3}[1-9]|[0-9]{2}[1-9][0-9]{1}|[0-9]{1}[1-9][0-9]{2}|[1-9][0-9]{3})-(((0[13578]|1[02])-(0[1-9]|[12][0-9]|3[01]))|((0[469]|11)-(0[1-9]|[12][0-9]|30))|(02-(0[1-9]|[1][0-9]|2[0-8]))))|((([0-9]{2})(0[48]|[2468][048]|[13579][26])|((0[48]|[2468][048]|[3579][26])00))-02-29)

The regular validation expression in DD/MM/YYYY * format is:

(((0[1-9]|[12][0-9]|3[01])/((0[13578]|1[02]))|((0[1-9]|[12][0-9]|30)/(0[469]|11))|(0[1-9]|[1][0-9]|2[0-8])/(02))/([0-9]{3}[1-9]|[0-9]{2}[1-9][0-9]{1}|[0-9]{1}[1-9][0-9]{2}|[1-9][0-9]{3}))|(29/02/(([0-9]{2})(0[48]|[2468][048]|[13579][26])|((0[48]|[2468][048]|[3579][26])00)))

Friends can collect these common regular expressions first and then check them.

Well, that's all for today. I'm binghe. If you have any questions, you can leave a message at the end of the text or send me a private letter on CSDN. I'll reply to you when I see it. Finally, my friends praise, collect, comment and share. Let's go. Binghe online asks for three links~~

Topics: Java regex