What is a regular expression?
A regular expression can be a search pattern formed by a sequence of characters. When you search for data in text, you can use search patterns to describe what you want to query.
Creation of regular expressions
Literal
var reg = /regular expression /Modifier ; var reg = /hello/g;
Constructor
var reg = new RegExp("regular expression ","Modifier "); var reg = new RegExp("hello","g");
Character classification
Ordinary character
Letters, numbers, underscores, Chinese characters, symbols without special meaning (,;! @ etc.)
In fact, characters that are not special characters are ordinary characters
Special characters
\: Escape special characters to normal characters
pattern modifier
i: ignoreCase, ignoring case when matching
m: multiline, multiline matching
g: global, global matching
When a literal creates a regular, the pattern modifier is written after a pair of backslashes
Regular expression instance method
exec
Can be used to match strings that match regular expressions in a string
The return value is an array:
[matching content, index: starting position of matching in str, input: parameter string, groups: undefined],
No match returned null
var str = 'hello world, hello js'; var reg1 = /hello/; var reg2 = /hello/g; var reg3 = /ellc/; console.log(reg1.exec(str));//[ 'hello',index: 0,input: 'hello world, hello js',groups: undefined ] console.log(reg2.exec(str));//[ 'hello',index: 0,input: 'hello world, hello js',groups: undefined ] console.log(reg3.exec(str));//null
be careful:
1) If there is a modifier "g" in the regular expression, the lastIndex attribute will be maintained in the regular expression instance reg to record the next start position. When exec is executed for the second time, it will be retrieved from lastIndex.
2) If there is no modifier "g" in the regular expression, the lastIndex attribute will not be maintained and will be retrieved from the beginning each time
test
It is used to test whether there is a string that can match the regular expression in the string to be detected. If so, it returns true; otherwise, it returns false
var str = 'hello world, hello js'; var reg1 = /hello/; var reg2 = /helle/; console.log(reg1.test(str));//true console.log(reg2.test(str));//false
be careful:
1) If there is a modifier "g" in the regular expression, the lastIndex attribute will be maintained in reg to record the next start position. When the test is executed the second time, it will be retrieved from lastIndex. 2) If there is no modifier "g" in the regular expression, the lastIndex attribute will not be maintained and will be retrieved from the beginning each time
toString/toLocaleString
Convert the contents of regular expressions into literal strings / strings with local characteristics (no effect in JS)
var reg1 = /hello/; console.log(reg1.toString()); //Return / hello / String console.log(reg1.toLocaleString()); //Return / hello / String
valueOf
Returns the regular expression itself
var reg1 = /hello/; console.log(reg1.valueOf()); // Returns the regular expression itself
Regular expression instance properties
lastIndex
When global matching is not set, the property value is always 0
When global matching is set, every time exec/test is executed to match, lastIndex will move to the next position of the matched string. When there is no string that can be matched again after the pointed position, exec will return null in the next execution, test will return false, and then lastIndex will return to zero to re match from the beginning of the string
It can be understood that the starting point of each regular search is lastIndex
var str = 'hello hello hello'; var reg1 = /hello/; var reg2 = /hello/g; console.log(reg1.lastIndex); // 0 console.log(reg1.exec(str)); // Return the first hello console.log(reg1.lastIndex); // 0 console.log(reg2.lastIndex); // 0 console.log(reg2.exec(str)); // Return the first hello console.log(reg2.lastIndex); // 5 console.log(reg2.lastIndex); // 5 console.log(reg2.exec(str)); // Return to the second hello console.log(reg2.lastIndex); // 11 console.log(reg2.lastIndex); // 11 console.log(reg2.exec(str)); // Return to the third hello console.log(reg2.lastIndex); // 17 console.log(reg2.exec(str)); //Return null console.log(reg2.lastIndex); // 0 console.log(reg2.exec(str)); // Return the first hello
ignoreCase,global,multiline
Judge whether there are three pattern modifiers in regular expressions: ignore case, global matching and multi line matching
var reg1 = /hello/igm; console.log(reg1.ignoreCase); //true console.log(reg1.global); //true console.log(reg1.multiline); //true
source
Returns a literal regular expression (similar to toString)
var reg1 = /hello/igm; console.log(reg1.source); //hello
Regular expression syntax - metacharacters
Direct quantity character
character | matching |
---|---|
Alphanumeric characters | oneself |
\o | Null character |
\t | Tab |
\n | Newline character |
\v | vertical tab |
\f | Page feed |
\r | Carriage return |
Character set
Matches any character in the collection. You can use the hyphen '-' to specify a range
[abc] find any character between square brackets
var str = 'abc qwe abd' var reg1 = /[abc]/;// It returns true as long as it contains a, b or c console.log(reg1.test(str)); //true
[0-9] find any number from 0 to 9
var str = 'abc qwe abd1' var reg1 = /[0-9]/igm; console.log(reg1.test(str)); //true
[^ xyz] an antisense or supplementary character set, also known as an antisense character group. That is, it matches any character that is not in parentheses. You can also specify a range of characters by using the hyphen '-'.
Note: ^ written in [] is an antisense character group
var str = 'abc qwe abd1,2' console.log(str); var reg1 = /[^abc ]/igm; console.log(reg1.exec(str)); //true
Boundary character
^Match input start. Indicates the text that matches the beginning of the line (starting with whom). If the multiline flag is set to true, the character will also match the beginning after a line break.
$matches the end of the input. Indicates the text that matches the end of the line (who ends). If the multiline flag is set to true, the character will also match the end before a line break.
If ^ and $are together, it means that it must be an exact match.
var rg = /abc/; // /abc / as long as abc is included, the string returns true console.log(rg.test('abc')); //true console.log(rg.test('abcd')); //true console.log(rg.test('aabcd'));//true console.log('---------------------------'); // It must be a string beginning with abc to be satisfied var reg = /^abc/; console.log(reg.test('abc')); // true console.log(reg.test('abcd')); // true console.log(reg.test('aabcd')); // false console.log('---------------------------'); // It must be a string ending in abc to be satisfied var reg = /abc$/; console.log(reg.test('abc')); // true console.log(reg.test('qweabc')); // true console.log(reg.test('aabcd')); // false console.log('---------------------------'); var reg1 = /^abc$/; // Exact matching requires an abc string to meet the specification console.log(reg1.test('abc')); // true console.log(reg1.test('abcd')); // false console.log(reg1.test('aabcd')); // false console.log(reg1.test('abcabc')); // false
Character sets are used with '^' and '$'
// Choose one from three. Only the letters a, b or c return true var rg1 = /^[abc]$/; console.log(rg1.test('aa'));//false console.log(rg1.test('a'));//true console.log(rg1.test('b'));//true console.log(rg1.test('c'));//true console.log(rg1.test('abc'));//false //26 English letters. Any letter returns true - indicating the range from a to z var reg = /^[a-z]$/ console.log(reg.test('a'));//true console.log(reg.test('z'));//true console.log(reg.test('A'));//false //Character combination // 26 English letters (both uppercase and lowercase) any letter returns true var reg1 = /^[a-zA-Z0-9]$/; //The addition of ^ inside the inverted square brackets indicates negation. As long as the characters in the square brackets are included, false is returned. var reg2 = /^[^a-zA-Z0-9]$/; console.log(reg2.test('a'));//false console.log(reg2.test('B'));//false console.log(reg2.test(8));//false console.log(reg2.test('!'));//true
\b matches a zero width word boundary, indicating a word (not a character) boundary, that is, the position between the word and the space, or the position between the character (\ w) and the beginning or end of the string.
\b matches a zero width non word boundary, which is opposite to "\ b".
Character class
A character class is formed by placing the direct quantity character in square brackets. A character class can match any character it contains. For example, / [abc] / matches any of the letters "a", "b" and "c". The symbol "^" is used to define the negative character class. For example, / [^ abc] / matches "a", "b" and "c" All characters except. Character classes can use hyphens to represent the character range, for example: / [a-z] /, to match any letter and number in the Latin alphabet, [a-zA-Z0-9]
Character class | meaning |
---|---|
. | Matches any single character except line feed and carriage return, equivalent to [^ \ n\r] |
\d | Match a numeric character, equivalent to [0-9] |
\D | [^0-9] |
\w | Matches any single character including underscores, including a ~ Z, a ~ Z, 0 ~ 9 and the underscore "", which is equivalent to [a-zA-Z0-9] |
\W | [^a-zA-Z0-9_] |
\s | Matches any Unicode white space characters, including spaces, tabs, page breaks, etc., equivalent to [\ f\t\n\r] |
\S | [^\f\t\n\r] |
". any single character other than line feed \ NAND carriage return
var str = '\nHello World Hello\r JavaScript'; console.log(str); var reg1 = /./g; console.log(reg1.exec(str));
\d matches a numeric character, equivalent to [0-9]
// Start with a number var str = '123Hello World Hello 123JavaScript'; console.log(str); var reg1 = /^\d/g; console.log(reg1.exec(str));
\D is equivalent to [^ 0-9]
// Do not start with an array var str = 'Hello World Hello 123JavaScript'; console.log(str); var reg1 = /^\D/g; console.log(reg1.exec(str));
\w matches any single character including underscores, including a ~ Z, a ~ Z, 0 ~ 9 and the underscore "", which is equivalent to [a-zA-Z0-9_]
\W [^a-zA-Z0-9_]
var str = '!Hello World Hello JavaScript'; // \w -> [a-zA-Z0-9_] var reg1 = /^\w/; console.log(reg1.test(str)); // \W -> [^a-zA-Z0-9_] var reg2 = /^\W/; console.log(reg2.test(str));
\s matches any Unicode white space characters, including spaces, tabs, page breaks, etc., equivalent to [\ f\t\n\r]
// Start with a blank character var str = '\nHello World Hello 123JavaScript'; console.log(str); var reg1 = /^\s/g; console.log(reg1.exec(str));
\S is equivalent to [^ \ f\t\n\r]
// Do not start with a blank character var str = 'Hello World Hello 123JavaScript'; console.log(str); var reg1 = /^\S/g; console.log(reg1.exec(str));
...