Java Web Regular Expressions

Posted by mraza on Thu, 23 May 2019 20:21:11 +0200

For the web, string processing is particularly important, and regular expression is a sharp tool for string processing. It can be seen in character filtering and verification.

Today, we need to deal with a json string. In the process of using String.replaceAll, we encounter an awkward situation where regular expressions can't be written. So it's better to simply complement the knowledge of regular expressions.

Let's start with the use of a regular expression.

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Test {
    public static void main(String[] args) {
        
        //This regular expression represents matching 1254-8888888,125-6966356 Characters like this
        String regex = "\\d{3,4}-\\d{7}";
        
        //Initial string
        String str = "agdf/1254-8888888sssdf125-6966356";
        String aft = str.replaceAll(regex, "replace");
        System.out.println("repalceAll after==="+aft);
        
        Pattern p = Pattern.compile(regex);
        Matcher m = p.matcher(str);
        //m.find()by true It matches all the time.
        while(m.find()){
            //Because there are no parentheses in regular expressions,So the capture group is 0.
            System.out.println("Capture the number of arrays, m.groupCount==="+m.groupCount());
            
            //m.group()Amount to m.group(0),Corresponding groupCount,That is, direct matching, no grouping
            System.out.println("m.group==="+m.group(0));
            
        }
        
    }

}

Operation result

After repalceAll==== agdf/ Replace sssdf
m.group===1254-8888888
 Capture array, M. groupCount====0
m.group===125-6966356
 Capture array, M. groupCount====0

From that regular expression, String regex = " D {3,4} - d {7}";

"\d" is the meaning of java. The original rule should be "d", which means the number matching 0 to 9. It should also be written like this [0-9]. No attempt has been made.

The number of \ d{3,4} denotes \ d has 3 to 4, i.e. matching 123,3212,000 numbers. The following \ d{7} is the same, matching 888 and other characters.

As for grouping, look at the demo below.

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Test {
    public static void main(String[] args) {
        
        //This regular expression represents matching 1254-8888888,125-6966356 Characters like this
        String regex = "([a-zA-Z]+)(\\d{7})";
        
        //Initial string
        String str = "AGdf12548888888sssdf1256966356";
        
        Pattern p = Pattern.compile(regex);
        Matcher m = p.matcher(str);
        //m.find()by true It matches all the time.
        while(m.find()){
            //Because there are no parentheses in regular expressions,So the capture group is 0.
            System.out.println("Capture the number of arrays, m.groupCount==="+m.groupCount());
            
            //m.group()Amount to m.group(0),Corresponding groupCount,That is, direct matching, no grouping
            System.out.println("m.group(0)==="+m.group(0));
            //Corresponding to the first parenthesis([a-zA-Z]+)
            System.out.println("m.group(1)==="+m.group(1));
            //Corresponding to the second parenthesis(\\d{7})
            System.out.println("m.group(2)==="+m.group(2));
            
            System.out.println("=============I'm a newline character.============");
            
        }
        
    }

}

Operation result

Capture array, M. groupCount====2
m.group(0)===AGdf1254888
m.group(1)===AGdf
m.group(2)===1254888
 ============= I'm a newline character.============
Capture array, M. groupCount====2
m.group(0)===sssdf1256966
m.group(1)===sssdf
m.group(2)===1256966
 ============= I'm a newline character.============

Finally, the usual regular expression characters are appended.

Value range of characters
1. [a b c]: It may be a, b or C.
2. [^ a B c]: Represents not any of a,b,c
3.[a-zA-Z]: Representation in English
4. [0-9]: Representation is a number

Concise Character Representation
Match any character
\ d: Represents numbers
\ D: Represents non-numerals
\ s: Represents an empty character, [ t n r x f]
\ S: Represents a non-empty character, [^\ s]
\ w: Represents letters, numbers, underscores, [a-zA-Z0-9_]
\ W: Representation does not consist of letters, numbers, underscores

Quantitative expression
1.?: Represents 0 or 1 occurrence.
2. +: Represents one or more occurrences
3. *: Represents 0, 1 or more occurrences
4.{n}: Represents n occurrences
5.{n,m}: Represents the occurrence of n~m times
6.{n,}: Represents n or more occurrences

Logical expression
1.XY: Represents X followed by Y, where X and Y are part of the regular expression, respectively.
2.X|Y: Represents X or Y. For example, "foo D | f" matches foo (d or f), while "food|f" matches foo or F.
3.(X): A subexpression that treats X as a whole

Topics: Java JSON