JS regular expression

Posted by Riseykins on Wed, 24 Nov 2021 00:52:31 +0100

What is a regular expression?

A regular expression can be a search pattern formed by a sequence of characters. When you search for data in text, you can use search patterns to describe what you want to query.

Creation of regular expressions

Literal

var reg = /regular expression /Modifier ;
var reg = /hello/g;

Constructor

var reg = new RegExp("regular expression ","Modifier ");
var reg = new RegExp("hello","g");

Character classification

Ordinary character

Letters, numbers, underscores, Chinese characters, symbols without special meaning (,;! @ etc.)

In fact, characters that are not special characters are ordinary characters

Special characters

\: Escape special characters to normal characters

pattern modifier

i: ignoreCase, ignoring case when matching

m: multiline, multiline matching

g: global, global matching

When a literal creates a regular, the pattern modifier is written after a pair of backslashes

Regular expression instance method

exec

Can be used to match strings that match regular expressions in a string

The return value is an array:

[matching content, index: starting position of matching in str, input: parameter string, groups: undefined],

No match returned null

var str = 'hello world, hello js';
var reg1 = /hello/;
var reg2 = /hello/g;
var reg3 = /ellc/;
console.log(reg1.exec(str));//[ 'hello',index: 0,input: 'hello world, hello js',groups: undefined ]
console.log(reg2.exec(str));//[ 'hello',index: 0,input: 'hello world, hello js',groups: undefined ]
console.log(reg3.exec(str));//null

be careful:

1) If there is a modifier "g" in the regular expression, the lastIndex attribute will be maintained in the regular expression instance reg to record the next start position. When exec is executed for the second time, it will be retrieved from lastIndex.

2) If there is no modifier "g" in the regular expression, the lastIndex attribute will not be maintained and will be retrieved from the beginning each time

test

It is used to test whether there is a string that can match the regular expression in the string to be detected. If so, it returns true; otherwise, it returns false

var str = 'hello world, hello js';
var reg1 = /hello/;
var reg2 = /helle/;
console.log(reg1.test(str));//true
console.log(reg2.test(str));//false

be careful:

1) If there is a modifier "g" in the regular expression, the lastIndex attribute will be maintained in reg to record the next start position. When the test is executed the second time, it will be retrieved from lastIndex. 2) If there is no modifier "g" in the regular expression, the lastIndex attribute will not be maintained and will be retrieved from the beginning each time

toString/toLocaleString

Convert the contents of regular expressions into literal strings / strings with local characteristics (no effect in JS)

var reg1 = /hello/;
console.log(reg1.toString()); //Return / hello / String
console.log(reg1.toLocaleString()); //Return / hello / String

valueOf

Returns the regular expression itself

var reg1 = /hello/;
console.log(reg1.valueOf());  // Returns the regular expression itself

Regular expression instance properties

lastIndex

When global matching is not set, the property value is always 0

When global matching is set, every time exec/test is executed to match, lastIndex will move to the next position of the matched string. When there is no string that can be matched again after the pointed position, exec will return null in the next execution, test will return false, and then lastIndex will return to zero to re match from the beginning of the string

It can be understood that the starting point of each regular search is lastIndex

var str = 'hello hello hello';
var reg1 = /hello/;
var reg2 = /hello/g;
console.log(reg1.lastIndex);  // 0
console.log(reg1.exec(str));  // Return the first hello
console.log(reg1.lastIndex);  // 0

console.log(reg2.lastIndex);  // 0
console.log(reg2.exec(str));  // Return the first hello
console.log(reg2.lastIndex);  // 5

console.log(reg2.lastIndex);  // 5
console.log(reg2.exec(str));  // Return to the second hello
console.log(reg2.lastIndex);  // 11

console.log(reg2.lastIndex);  // 11
console.log(reg2.exec(str));  // Return to the third hello
console.log(reg2.lastIndex);  // 17

console.log(reg2.exec(str));  //Return null

console.log(reg2.lastIndex);  // 0
console.log(reg2.exec(str));  // Return the first hello

ignoreCase,global,multiline

Judge whether there are three pattern modifiers in regular expressions: ignore case, global matching and multi line matching

var reg1 = /hello/igm;
console.log(reg1.ignoreCase); //true
console.log(reg1.global); //true
console.log(reg1.multiline);  //true

source

Returns a literal regular expression (similar to toString)

var reg1 = /hello/igm;
console.log(reg1.source); //hello

Regular expression syntax - metacharacters

Direct quantity character

character	matching
Alphanumeric characters	oneself
\o	Null character
\t	Tab
\n	Newline character
\v	vertical tab
\f	Page feed
\r	Carriage return

Character set

Matches any character in the collection. You can use the hyphen '-' to specify a range

[abc] find any character between square brackets

var str = 'abc qwe abd'
var reg1 = /[abc]/;// It returns true as long as it contains a, b or c
console.log(reg1.test(str)); //true

[0-9] find any number from 0 to 9

var str = 'abc qwe abd1'
var reg1 = /[0-9]/igm;
console.log(reg1.test(str)); //true

[^ xyz] an antisense or supplementary character set, also known as an antisense character group. That is, it matches any character that is not in parentheses. You can also specify a range of characters by using the hyphen '-'.

Note: ^ written in [] is an antisense character group

var str = 'abc qwe abd1,2'
console.log(str);
var reg1 = /[^abc ]/igm;
console.log(reg1.exec(str)); //true

Boundary character

^Match input start. Indicates the text that matches the beginning of the line (starting with whom). If the multiline flag is set to true, the character will also match the beginning after a line break.

$matches the end of the input. Indicates the text that matches the end of the line (who ends). If the multiline flag is set to true, the character will also match the end before a line break.

If ^ and $are together, it means that it must be an exact match.

var rg = /abc/; 
// /abc / as long as abc is included, the string returns true
console.log(rg.test('abc'));  //true
console.log(rg.test('abcd')); //true
console.log(rg.test('aabcd'));//true
console.log('---------------------------');
// It must be a string beginning with abc to be satisfied
var reg = /^abc/;
console.log(reg.test('abc')); // true
console.log(reg.test('abcd')); // true
console.log(reg.test('aabcd')); // false
console.log('---------------------------');
// It must be a string ending in abc to be satisfied
var reg = /abc$/;
console.log(reg.test('abc')); // true
console.log(reg.test('qweabc')); // true
console.log(reg.test('aabcd')); // false
console.log('---------------------------');
var reg1 = /^abc$/; // Exact matching requires an abc string to meet the specification
console.log(reg1.test('abc')); // true
console.log(reg1.test('abcd')); // false
console.log(reg1.test('aabcd')); // false
console.log(reg1.test('abcabc')); // false

Character sets are used with '^' and '$'

// Choose one from three. Only the letters a, b or c return true
var rg1 = /^[abc]$/; 
console.log(rg1.test('aa'));//false
console.log(rg1.test('a'));//true
console.log(rg1.test('b'));//true
console.log(rg1.test('c'));//true
console.log(rg1.test('abc'));//false
//26 English letters. Any letter returns true - indicating the range from a to z  
var reg = /^[a-z]$/ 
console.log(reg.test('a'));//true
console.log(reg.test('z'));//true
console.log(reg.test('A'));//false
//Character combination
// 26 English letters (both uppercase and lowercase) any letter returns true
var reg1 = /^[a-zA-Z0-9]$/; 
//The addition of ^ inside the inverted square brackets indicates negation. As long as the characters in the square brackets are included, false is returned.
var reg2 = /^[^a-zA-Z0-9]$/;
console.log(reg2.test('a'));//false
console.log(reg2.test('B'));//false
console.log(reg2.test(8));//false
console.log(reg2.test('!'));//true

\b matches a zero width word boundary, indicating a word (not a character) boundary, that is, the position between the word and the space, or the position between the character (\ w) and the beginning or end of the string.

\b matches a zero width non word boundary, which is opposite to "\ b".

Character class

A character class is formed by placing the direct quantity character in square brackets. A character class can match any character it contains. For example, / [abc] / matches any of the letters "a", "b" and "c". The symbol "^" is used to define the negative character class. For example, / [^ abc] / matches "a", "b" and "c" All characters except. Character classes can use hyphens to represent the character range, for example: / [a-z] /, to match any letter and number in the Latin alphabet, [a-zA-Z0-9]

Character class	meaning
.	Matches any single character except line feed and carriage return, equivalent to [^ \ n\r]
\d	Match a numeric character, equivalent to [0-9]
\D	[^0-9]
\w	Matches any single character including underscores, including a ~ Z, a ~ Z, 0 ~ 9 and the underscore "", which is equivalent to [a-zA-Z0-9]
\W	[^a-zA-Z0-9_]
\s	Matches any Unicode white space characters, including spaces, tabs, page breaks, etc., equivalent to [\ f\t\n\r]
\S	[^\f\t\n\r]

". any single character other than line feed \ NAND carriage return

var str = '\nHello World Hello\r JavaScript';
console.log(str);
var reg1 = /./g;
console.log(reg1.exec(str));

\d matches a numeric character, equivalent to [0-9]

// Start with a number
var str = '123Hello World Hello 123JavaScript';
console.log(str);
var reg1 = /^\d/g;
console.log(reg1.exec(str));

\D is equivalent to [^ 0-9]

// Do not start with an array
var str = 'Hello World Hello 123JavaScript';
console.log(str);
var reg1 = /^\D/g;
console.log(reg1.exec(str));

\w matches any single character including underscores, including a ~ Z, a ~ Z, 0 ~ 9 and the underscore "", which is equivalent to [a-zA-Z0-9_]

\W [^a-zA-Z0-9_]

var str = '!Hello World Hello JavaScript';
// \w -> [a-zA-Z0-9_]
var reg1 = /^\w/;
console.log(reg1.test(str));
// \W -> [^a-zA-Z0-9_]
var reg2 = /^\W/;
console.log(reg2.test(str));

\s matches any Unicode white space characters, including spaces, tabs, page breaks, etc., equivalent to [\ f\t\n\r]

// Start with a blank character
var str = '\nHello World Hello 123JavaScript';
console.log(str);
var reg1 = /^\s/g;
console.log(reg1.exec(str));

\S is equivalent to [^ \ f\t\n\r]

// Do not start with a blank character
var str = 'Hello World Hello 123JavaScript';
console.log(str);
var reg1 = /^\S/g;
console.log(reg1.exec(str));

...

Topics: Javascript regex

Programmer Think