For the web, string processing is particularly important, and regular expression is a sharp tool for string processing. It can be seen in character filtering and verification.
Today, we need to deal with a json string. In the process of using String.replaceAll, we encounter an awkward situation where regular expressions can't be written. So it's better to simply complement the knowledge of regular expressions.
Let's start with the use of a regular expression.
import java.util.regex.Matcher; import java.util.regex.Pattern; public class Test { public static void main(String[] args) { //This regular expression represents matching 1254-8888888,125-6966356 Characters like this String regex = "\\d{3,4}-\\d{7}"; //Initial string String str = "agdf/1254-8888888sssdf125-6966356"; String aft = str.replaceAll(regex, "replace"); System.out.println("repalceAll after==="+aft); Pattern p = Pattern.compile(regex); Matcher m = p.matcher(str); //m.find()by true It matches all the time. while(m.find()){ //Because there are no parentheses in regular expressions,So the capture group is 0. System.out.println("Capture the number of arrays, m.groupCount==="+m.groupCount()); //m.group()Amount to m.group(0),Corresponding groupCount,That is, direct matching, no grouping System.out.println("m.group==="+m.group(0)); } } }
Operation result
After repalceAll==== agdf/ Replace sssdf m.group===1254-8888888 Capture array, M. groupCount====0 m.group===125-6966356 Capture array, M. groupCount====0
From that regular expression, String regex = " D {3,4} - d {7}";
"\d" is the meaning of java. The original rule should be "d", which means the number matching 0 to 9. It should also be written like this [0-9]. No attempt has been made.
The number of \ d{3,4} denotes \ d has 3 to 4, i.e. matching 123,3212,000 numbers. The following \ d{7} is the same, matching 888 and other characters.
As for grouping, look at the demo below.
import java.util.regex.Matcher; import java.util.regex.Pattern; public class Test { public static void main(String[] args) { //This regular expression represents matching 1254-8888888,125-6966356 Characters like this String regex = "([a-zA-Z]+)(\\d{7})"; //Initial string String str = "AGdf12548888888sssdf1256966356"; Pattern p = Pattern.compile(regex); Matcher m = p.matcher(str); //m.find()by true It matches all the time. while(m.find()){ //Because there are no parentheses in regular expressions,So the capture group is 0. System.out.println("Capture the number of arrays, m.groupCount==="+m.groupCount()); //m.group()Amount to m.group(0),Corresponding groupCount,That is, direct matching, no grouping System.out.println("m.group(0)==="+m.group(0)); //Corresponding to the first parenthesis([a-zA-Z]+) System.out.println("m.group(1)==="+m.group(1)); //Corresponding to the second parenthesis(\\d{7}) System.out.println("m.group(2)==="+m.group(2)); System.out.println("=============I'm a newline character.============"); } } }
Operation result
Capture array, M. groupCount====2 m.group(0)===AGdf1254888 m.group(1)===AGdf m.group(2)===1254888 ============= I'm a newline character.============ Capture array, M. groupCount====2 m.group(0)===sssdf1256966 m.group(1)===sssdf m.group(2)===1256966 ============= I'm a newline character.============
Finally, the usual regular expression characters are appended.
Value range of characters
1. [a b c]: It may be a, b or C.
2. [^ a B c]: Represents not any of a,b,c
3.[a-zA-Z]: Representation in English
4. [0-9]: Representation is a number
Concise Character Representation
Match any character
\ d: Represents numbers
\ D: Represents non-numerals
\ s: Represents an empty character, [ t n r x f]
\ S: Represents a non-empty character, [^\ s]
\ w: Represents letters, numbers, underscores, [a-zA-Z0-9_]
\ W: Representation does not consist of letters, numbers, underscores
Quantitative expression
1.?: Represents 0 or 1 occurrence.
2. +: Represents one or more occurrences
3. *: Represents 0, 1 or more occurrences
4.{n}: Represents n occurrences
5.{n,m}: Represents the occurrence of n~m times
6.{n,}: Represents n or more occurrences
Logical expression
1.XY: Represents X followed by Y, where X and Y are part of the regular expression, respectively.
2.X|Y: Represents X or Y. For example, "foo D | f" matches foo (d or f), while "food|f" matches foo or F.
3.(X): A subexpression that treats X as a whole