Look around application case of regular expression

Posted by epukinsk on Sat, 22 Jan 2022 16:35:31 +0100

1, Thousands separator case (I)

Reverse look and sequential look are applied together.

**Requirement: * * numbers are formatted into currency format separated by.

Regular expression: (? N) (? < = \ d) (? <! \. \ d *) (? = (\ D {3}) + (\. | $))

Test code:

double[] data = new double[] { 
0, 12, 123, 1234, 12345, 123456, 1234567, 123456789, 1234567890, 12.345, 
123.456, 1234.56, 12345.6789, 123456.789, 1234567.89, 12345678.9 
};

foreach (double d in data) {
	richTextBox2.Text += "Source string:" + d.ToString().PadRight(15) + "format:" 
	+ Regex.Replace(d.ToString(), @"(?n)(?<=\d)(?<!\.\d*)(?=(\d{3})+(\.|$))", ",") + "\n";
}

Output results:

Source string: 0              Format: 0

Source string: 12             Formatting: 12

Source string: 123            Formatting: 123

Source string: 1234           Format: 1,234

Source string: 12345          Formatting: 12,345

Source string: 123456         Formatting: 123,456

Source string: 1234567        Format: 1,234,567

Source string: 123456789      Formatting: 123,456,789

Source string: 1234567890     Format: 1,234,567,890

Source string: 12.345         Formatting: 12.345

Source string: 123.456        Formatting: 123.456

Source string: 1234.56        Format: 1,234.56

Source string: 12345.6789     Formatting: 12,345.6789

Source string: 123456.789     Formatting: 123,456.789

Source string: 1234567.89     Format: 1,234,567.89

Source string: 12345678.9     Formatting: 12,345,678.9

Implementation analysis:

Firstly, according to the requirements, it can be determined that some specific positions are replaced with,, and then the laws of these positions are analyzed and found, which are abstracted and expressed by regular expressions.

The left side of this position must be a number
This position appears to the right Or to the end, it must be a number, and the number of numbers must be a multiple of 3
Any number to the left of this position cannot appear

From the above three, you can completely determine these positions. As long as you realize the above three, you can combine regular expressions.

According to the analysis, the final matching result is a position, so all sub expressions are required to be zero width.

It is a condition attached to the left side of the current location, so it is necessary to look around in reverse order. Because the requirement must appear, it is positive. The sub expression that meets this condition is (? < = \ d)
It is an additional condition on the right side of the current location, so the sequential look is required, so it is positive. It is a number, and the number is a multiple of 3, that is (? = (\ d{3}) +), until it appears Or end, i.e. (? = (\ d{3}) + (\. | $))
It is an additional condition on the left side of the current position, so it needs to look around in reverse order. Because the requirement cannot appear, it is negative, that is (? <! \. \ d *)

Because the subexpressions with zero width are non mutually exclusive, and the last matching is in the same position, the sequence does not affect the last matching result, and can be combined arbitrarily. It is just customary to write the reverse look on the left and the sequential look on the right.

Note: This is just an example to illustrate the use of look around. In fact, this requirement directly uses string Format can do it.

2, Thousands separator case (II)

Thousands separator, as the name suggests, is to add a comma every three digits of a number. This is the habit of referring to the west, adding a symbol to the number to avoid that it is difficult to see its value intuitively because the number is too long.

So how to convert a string of numbers into thousands of separator form?

var str = "1234567890.9876";
console.log((+str).toLocaleString()); // 1,234,567,890.988

As above, toLocaleString() returns the "localized" string form of the current object.

If the object is of type Number, the string form of the value divided by a specific symbol will be returned;
If the object is of type Array, first convert each item in the Array to a string, and then connect these strings with the specified separator and return.

We try to use look around to deal with the following:

var str = 1234567890;
function thousand(str){
 return str.replace(/(?!^)(?=([0-9]{3})+$)/g,','); // After several iterations of matching, match to 3 positions, and replace the matched positions with commas
}
console.log(thousand(str));//"1,234,567,890"
console.log(thousand("123456"));//"123,456"
console.log(thousand("1234567879876543210"));//"1,234,567,879,876,543,210"

The regular expressions used above are divided into two blocks (?! ^) and (? = ([0-9] {3}) + $). Let's look at the following parts first, and then analyze them step by step.

[0-9] {3} represents 3 consecutive digits;
([0-9] {3}) + indicates that three consecutive digits appear at least once or more;
([0-9] {3}) + $until the end of the string;
Then (? = ([0-9] {3}) + $) means to match a zero width position, and from this position to the end of the string, there is at least one group of numbers with three numbers as one group (that is, the number of numbers with three positive integer multiples is obtained; that is, the number of numbers with three times 1 and three times 2 is obtained by multiplying the product by three);
The regular expression uses global matching g, which means that after matching to a position, it will continue to match until it cannot match;
Replacing this position with a comma is actually adding a comma every three digits;
Of course, for the string 123456, which just has a number of positive integer multiples of 3, of course, you can't add a comma before 1, so use (?! ^) to specify that the replacement position can't be the starting position.

3, Sequential positive look around

Suppose js gets a piece of html code through ajax, as follows:

var responseText = "<div data='dev.xxx.txt'></div><img src='dev.xxx.png'/>";

Now we need to replace the dev string in the src attribute of the img tag with the test string.

Since the above responseText string contains at least two substrings dev, it is obviously impossible to directly replace the string dev as test;
At the same time, because js does not support reverse look around, we can't judge the prefix as src = 'in regular, and then replace dev;
We notice that the src attribute of the img tag is At the end of png, based on this, you can use sequential positive look around.

var reg = /dev(?=[^']*png)/; //To prevent matching to the first dev, you need to exclude single quotes or angle brackets before wildcards
var str = responseText.replace(reg,"test");
console.log(str);//<div data='dev.xxx'></div><img src='test.xxx.png' />

Of course, the above is not only a solution for looking around in sequence, but also for capturing grouping. So where is the look around senior? The advantage of advanced look around is that it can locate a location through one capture. It is often effective for complex text replacement scenes, and grouping requires more operations.

Programmer Think

Look around application case of regular expression

1, Thousands separator case (I)

2, Thousands separator case (II)

3, Sequential positive look around

Hot Topics