Overview of regular expressions
The concept of regular expression: use a single string to describe or match a series of strings that conform to certain syntax rules
1. Find the rules through a large number of strings and get the definition rules
2. Use this rule to match new strings
3. The matching is successful and the corresponding operation is made
Basic syntax of regular expressions
1. Literal character
The character itself is a regular character
public class RedularDemo2 { public static void main(String[] args) { String str = "ab123342asdasqwe&;123."; //One method in the String class is the replace function, which replaces all characters that meet the rules //public String replaceAll(String regex,String replacement) // Replace each substring of this string that matches the given regular expression with the given replacement. String regex = "\\."; System.out.println(str.replaceAll(regex,"_"));//ab123342asdasqwe&;123_ regex = "b"; System.out.println(str.replaceAll(regex,"_"));//a_123342asdasqwe&;123. } }
2. Metacharacter
character | describe |
---|---|
\ | Marks the next character as a special character, or a literal character, or a backward reference, or an octal escape character. For example, 'n' matches the character 'n'. ' \N 'matches a newline character. Sequence '\' matches' ', while' ('matches' ('. |
^ | Matches the start of the input string. If the Multiline property of the RegExp object is set, ^ also matches the position after '\ n' or '\ r'. |
$ | Matches the end of the input string. If the Multiline property of the RegExp object is set, $also matches the position before '\ n' or '\ r'. |
* | Matches the previous subexpression zero or more times. For example, zo * can match "z" and "zoo"* Equivalent to {0,}. |
+ | Matches the previous subexpression one or more times. For example, 'zo +' can match "zo" and "zoo", but not "z"+ Equivalent to {1,}. |
? | Matches the previous subexpression zero or once. For example, "do(es)" Can match "do" or "does".? Equivalent to {0,1}. |
{n} | N is a nonnegative integer. Match the determined n times. For example, 'o{2}' cannot match 'o' in "Bob", but it can match two o's in "food". |
{n,} | n is a nonnegative integer. Match at least n times. For example, 'o{2,}' cannot match 'o' in "Bob", but can match all o's in "fooood" o{1,} 'is equivalent to' O + ' o{0,} 'is equivalent to' o * '. |
{n,m} | Both m and N are nonnegative integers, where n < = m. At least n matches and at most m matches. For example, "o{1,3}" will match the first three o's in "food." o{0,1} 'is equivalent to' o? '. Please note that there can be no space between comma and two numbers. |
? | The matching pattern is non greedy when the character follows any other qualifier (*, +,?, {n}, {n,}, {n,m}). The non greedy pattern matches as few strings as possible, while the default greedy pattern matches as many strings as possible. For example, for the string "oooo",'O +? ' A single 'o' will be matched and 'O +' will match all 'o'. |
. | Matches any single character except line breaks (\ n, \ r). To match any character including '\ n', use a pattern like "(. | \ n)". |
(pattern) | Match pattern and get this match. The obtained Matches can be obtained from the generated Matches collection. The SubMatches collection is used in VBScript and the $0... $9 attribute is used in JScript. To match parenthesis characters, use '(' or ')'. |
(?:pattern) | The pattern is matched but the matching result is not obtained, that is, it is a non obtained match and will not be stored for later use. This is useful when using the or character (|) to combine parts of a pattern. For example, 'industry (?: y|ies) is a simpler expression than' industry|industries'. |
(?=pattern) | look ahead positive assert matches the lookup string at the beginning of any string matching pattern. This is a non fetched match, that is, the match does not need to be fetched for later use. For example, "Windows(?=95|98|NT|2000)" can match "windows" in "Windows2000", but cannot match "windows" in "Windows3.1". The pre check does not consume characters, that is, after a match occurs, the search for the next match starts immediately after the last match, rather than after the characters containing the pre check. |
(?!pattern) | A positive negative assert matches the lookup string at the beginning of any string that does not match the pattern. This is a non fetched match, that is, the match does not need to be fetched for later use. For example, "Windows(?!95|98|NT|2000)" can match "windows" in "Windows3.1", but cannot match "windows" in "Windows2000". The pre check does not consume characters, that is, after a match occurs, the search for the next match starts immediately after the last match, rather than after the characters containing the pre check. |
(?<=pattern) | The look behind positive pre check is similar to the forward positive pre check, but in the opposite direction. For example, "(? < = 95|98|nt|2000) Windows" can match "Windows" in "2000Windows", but cannot match "Windows" in "3.1 Windows". |
(?<!pattern) | Reverse negative pre check is similar to positive negative pre check, but in the opposite direction. For example, "(? <! 95|98|nt|2000) Windows" can match "Windows" in "3.1 Windows", but cannot match "Windows" in "2000 Windows". |
x|y | Match x or y. For example, 'z|food' can match "Z" or "food" (z|f)ood 'matches "zoo" or "food". |
[xyz] | Character set. Match any character contained. For example, '[abc]' can match 'a' in 'plain'. |
[^xyz] | Negative character set. Matches any characters that are not included. For example, 'abc' can match 'p', 'l', 'i', 'n' in 'plain'. |
[a-z] | Character range. Matches any character within the specified range. For example, '[a-z]' can match any lowercase character in the range of 'a' to 'z'. |
[^a-z] | Negative character range. Matches any character that is not within the specified range. For example, 'a-z' can match any character that is not in the range of 'a' to 'z'. |
\b | Match a word boundary, that is, the position between the word and the space. For example, 'er\b' can match 'er' in 'never', but not 'er' in 'verb'. |
\B | Matches non word boundaries. ' er\B 'can match' er 'in' verb ', but cannot match' er 'in' never '. |
\cx | Matches the control character indicated by x. For example, \ cM matches a Control-M or carriage return. The value of x must be either A-Z or one of A-Z. Otherwise, c is treated as a literal 'c' character. |
\d | Matches a numeric character. Equivalent to [0-9]. |
\D | Matches a non numeric character. Equivalent to 0-9. |
\f | Match a page feed. Equivalent to \ x0c and \ cL. |
\n | Match a newline character. Equivalent to \ x0a and \ cJ. |
\r | Match a carriage return. Equivalent to \ x0d and \ cM. |
\s | Matches any white space characters, including spaces, tabs, page breaks, and so on. Equivalent to [\ f\n\r\t\v]. |
\S | Matches any non whitespace characters. Equivalent to \ f\n\r\t\v. |
\t | Match a tab. Equivalent to \ x09 and \ cI. |
\v | Match a vertical tab. Equivalent to \ x0b and \ cK. |
\w | Match letters, numbers, underscores. Equivalent to '[a-za-z0-u9]'. |
\W | Matches non letters, numbers, underscores. Equivalent to 'A-Za-z0-9_'. |
\xn | Match n, where n is the hexadecimal escape value. Hexadecimal escape value must be two digits long. For example, '\ x41' matches' A '.' \x041 'is equivalent to' \ X04 '& "1". ASCII encoding can be used in regular expressions. |
\num | Match num, where num is a positive integer. A reference to the match obtained. For example, '(.)\ 1 'matches two consecutive identical characters. |
\n | Identifies an octal escape value or a backward reference. If \ n at least n previously obtained subexpressions, then n is a backward reference. Otherwise, if n is an octal digit (0-7), then n is an octal escape value. |
\nm | Identifies an octal escape value or a backward reference. If at least nm subexpressions are obtained before \ nm, nm is a backward reference. If there are at least n fetches before \ nm, n is a backward reference followed by the text M. If none of the preceding conditions are met, if n and m are octal digits (0-7), \ nm will match the octal escape value nm. |
\nml | If n is an octal digit (0-3) and both m and l are octal digits (0-7), the octal escape value nml is matched. |
\un | Match n, where n is a Unicode character represented by four hexadecimal digits. For example, \ u00A9 matches the copyright symbol (?). |
2.1 character class
[]
public class RegularDemo3 { public static void main(String[] args) { String s = "ab123342asdasqwe&;123."; //Presentation format: [] //[] indicates that characters are classified and can match any character appearing in brackets //As long as there is any one of a, B and 2 in the matched string, it will be matched String regex = "[ab2]"; System.out.println(s.replaceAll(regex,"_"));//__1_334__sd_sqwe&;1_3. //Requirements: all but ab2 should be matched and replaced //^The presence of square brackets means to reverse and match characters that are not ab2 regex = "[^ab2]"; System.out.println(s.replaceAll(regex,"_"));//ab_2___2a__a_______2__ } }
2.2 scope
In fact, it adds a range based on the character class
public class RegularDemo4 { public static void main(String[] args) { String regex = "[ab]"; String s = "abcdefghijklmnABCDTW1234DWFadqwr&;123=."; System.out.println("Before matching:" + s);//Before matching: abcdefghijklmnabcdtw1234dwfadqwr&; 123=. System.out.println("========================================="); System.out.println(s.replaceAll(regex, "_"));//__cdefghijklmnABCDTW1234DWF_dqwr&;123=. //Requirement: matches all lowercase letters in the string //[a-z] indicates matching any lowercase letter from a to z regex = "[a-z]"; System.out.println(s.replaceAll(regex, "_"));//______________ABCDTW1234DWF_____&;123=. //[A-Z] indicates matching any capital letter from a to Z regex = "[A-Z]"; System.out.println(s.replaceAll(regex, "_"));//abcdefghijklmn______1234___adqwr&;123=. //Match case // regex = "[a-zA-Z]"; regex = "[A-z]"; System.out.println(s.replaceAll(regex, "_"));//____________________1234________&;123=. //Match number regex = "[0-9]"; System.out.println(s.replaceAll(regex, "_"));//abcdefghijklmnABCDTW____DWFadqwr&;___=. //Match numbers and uppercase and lowercase letters regex = "[0-z&.]"; System.out.println(s.replaceAll(regex, "_"));//_______________________________________ } }
2.3 predefined classes:
\d == [0-9] number
\D == [^0-9] non numeric
\s == [\r\n\f\r] white space character
\S == [^\r\n\f\r] non white space character
\w == [a-zA-Z0-9]
\W == [^a-zA-Z0-9]
. = = represents any character
public class RegularDemo5 { public static void main(String[] args) { String regex = "[0-9]"; String s = "abcde fghijklmn ABCDTW12.....34D WFadq r&;1!!!!23=."; System.out.println("Before matching:" + s);//Before matching: ABCDE fghijklmn abcdtw12 34D WFadq r&; 1!!!! 23=. System.out.println("========================================="); System.out.println(s.replaceAll(regex, "_"));//abcde fghijklmn ABCDTW__.....__D WFadq r&;_!!!!__=. regex = "\\d"; //[0-9] number System.out.println(s.replaceAll(regex, "_"));//abcde fghijklmn ABCDTW__.....__D WFadq r&;_!!!!__=. regex = "\\D"; //Indicates that all non numeric characters are matched System.out.println(s.replaceAll(regex, "_"));//______________________12_____34___________1____23__ regex = "\\s"; //Match all white space characters System.out.println(s.replaceAll(regex, "_"));//abcde_fghijklmn_ABCDTW12.....34D_WFadq_r&;1!!!!23=. regex = "\\S"; //Matches all characters except white space System.out.println(s.replaceAll(regex, "_"));//_____ _________ ________________ _____ ____________ regex = "\\w"; //Match all uppercase and lowercase letters and numbers System.out.println(s.replaceAll(regex, "_"));//_____ _________ ________.....___ _____ _&;_!!!!__=. regex = "\\W"; //Matches all non uppercase and lowercase letters and numbers System.out.println(s.replaceAll(regex, "_"));//abcde_fghijklmn_ABCDTW12_____34D_WFadq_r__1____23__ regex = "."; // Represents matching any character System.out.println(s.replaceAll(regex, "_"));//___________________________________________________ regex = "\\."; //Match This character System.out.println(s.replaceAll(regex, "_"));//abcde fghijklmn ABCDTW12_____34D WFadq r&;1!!!!23=_ } }
2.4 boundary characters
^: does not appear in brackets, indicating that it starts with xxx
$: ends with xxx
\b: Word boundary
\B: Non word boundary
public class RegularDemo6 { public static void main(String[] args) { //When there are no brackets, use ^, ^ indicates that it starts with xxx, and here it starts with ac String regex = "^abc"; String s = "abcdefg"; System.out.println("Before matching:" + s);//Before matching: abcdefg System.out.println("========================================="); System.out.println(s.replaceAll(regex, "_"));//_defg regex = "fg$"; System.out.println(s.replaceAll(regex, "_"));//abcde_ regex = "\\b"; s = "hello worpd 888 1 2 & ; 0 a b c d"; System.out.println("Before matching:" + s);//Before matching: Hello WorPd 888 1 2&; 0 a b c d System.out.println("==========================================="); System.out.println(s.replaceAll(regex, "_"));//_hello_ _worpd_ _888_ _1_ _2_ & ; _0_ _a_ _b_ _c_ _d_ regex = "\\B"; System.out.println(s.replaceAll(regex, "_"));//h_e_l_l_o w_o_r_p_d 8_8_8 1 2 _&_ _;_ 0 a b c d } }
2.5 quantifiers
? : 0 or 1 occurrences
+: one or more occurrences
*: any number of occurrences
{n} : exactly n times
{n,m}: n-m occurrences
{n, }; Indicates at least n occurrences
public class RegularDemo7 { public static void main(String[] args) { //Match 0 or 1 times starting with a String regex = "^a?"; String s = "baaabcdefaaaaaag"; System.out.println("Before matching:" + s);//Before matching: baaabcdefaaaag System.out.println("======================================="); System.out.println(s.replaceAll(regex, "_"));//_baaabcdefaaaaaag regex = "^a+"; System.out.println(s.replaceAll(regex, "_"));//baaabcdefaaaaaag regex = "^a*"; System.out.println(s.replaceAll(regex, "_"));//_baaabcdefaaaaaag //{n} : exactly n times //Requirement: match a string a character for 6 consecutive occurrences regex = "a{6}"; // aaaaaa System.out.println(s.replaceAll(regex, "*"));//baaabcdef*g //{n,m}: n-m occurrences regex = "a{3,4}"; // The matching is that the number of consecutive occurrences of a is between 3-4 //Range quantifiers are matched multiple times first System.out.println(s.replaceAll(regex, "*"));//b*bcdef*aag //{n, }; Indicates at least n occurrences regex = "a{6,}"; System.out.println(s.replaceAll(regex, "*"));//baaabcdef*g } }
2.6 grouping: ()
public class RegularDemo8 { public static void main(String[] args) { //It means that the matching content is ab plus 1-2 c String reagex = "abc{1,2}"; String s = "abcccccABC123123ABCabcccccABC123123ABCabcccccABC123123ABCabcabcabc123"; System.out.println("Before matching:" + s);//Before matching abcccccbc123123abcabccccabcc123123abcabccccabcc123123abcabcabcabc123123abcabcabc123 System.out.println("==========================================================="); System.out.println(s.replaceAll(reagex, "_"));//_cccABC123123ABC_cccABC123123ABC_cccABC123123ABC___123 //Parentheses indicate grouping //Indicates that abc occurs 1-2 times as a whole reagex = "(abc){1,2}"; System.out.println(s.replaceAll(reagex, "_"));//_ccccABC123123ABC_ccccABC123123ABC_ccccABC123123ABC__123 reagex = "ABC(abc){1,}"; //ABCabcabc System.out.println(s.replaceAll(reagex, "_"));//abcccccABC123123_ccccABC123123_ccccABC123123_123 //matches System.out.println(s.matches(reagex));//false } }
2.7 back reference (used to get values)
$: value, take the value in the corresponding group number, and the number of each group starts from 1
Demand: 2022-01-23 -- > 01 / 23 / 2022
public class RegularDemo9 { public static void main(String[] args) { //2022-01-23 String regex = "(\\d{4})-(\\d{2})-(\\d{2})"; String s = "2022-01-23 2022-02-24"; System.out.println(s.replaceAll(regex,"$2/$3/$1"));//01/23/2022 02/24/2022 //In the group, if I don't want it to generate a number?: regex = "(\\d{4})-(?:\\d{2})-(\\d{2})"; System.out.println(s.replaceAll(regex,"$2/$1"));//23/2022 24/2022 } }
3. Application of regular expression in java
How to use regular expressions to implement related operations in java?
1. String lookup operations: Pattern and Matcher
2. String matching operation: you can use the matches method of the string
3. String replacement: there are replaceAll() method and replaceFirst() method in string class
4. String segmentation: there is a split() method in the string class
package com.shujia.wyh.day16; import java.util.Arrays; import java.util.regex.Matcher; import java.util.regex.Pattern; public class RegularDemo10 { public static void main(String[] args) { String regex = "\\w{3,}"; String s = "abcd123"; System.out.println(s.matches(regex));//true regex = "[a-z]{2,}"; s = "abc defg hello111"; System.out.println(s.replaceAll(regex, "_"));//_ _ _111 System.out.println(s.replaceFirst(regex, "_"));//_ defg hello111 s = "abc sbdf 123ab sa123bddss &"; String[] s1 = s.split(" "); //Tool class traversal array System.out.println(Arrays.toString(s1));//[abc, sbdf, 123ab, sa123bddss, &] s = "abc sbdf 123ab sa123bddss &"; String[] s2 = s.split("a"); //Tool class traversal array System.out.println(Arrays.toString(s2));//[, bc sbdf 123, b s, 123bddss &] //Pattern and Matcher regex = "\\w{3,7}"; Pattern compile = Pattern.compile(regex); Matcher matcher = compile.matcher("abcd123"); System.out.println(matcher.matches());//true } }
Example: change the string "I want to learn programming..." into "I want to learn programming"
public class RegularDemo11 { public static void main(String[] args) { String s = "I, I, I, I, I..........I.......Yes, yes, yes, yes..................Yes, yes, yes...Learn to learn.......Practice programming.......Cheng Cheng Cheng Cheng Cheng Cheng Cheng"; //1. Take the first Remove String regex = "\\.+"; String s1 = s.replaceAll(regex, ""); System.out.println(s1); //2. Merge overlapping words regex = "(.)\\1+";//Back reference takes $1 String s2 = s1.replaceAll(regex, "$1"); System.out.println(s2); } }
Enumeration type
1. When there are only a limited number of objects in a class, we can define this class as an enumeration class
2. Enumeration is strongly recommended when you need to define a set of constants
How to define an enumeration class?
The implementation methods are different according to different JDK versions
1. At jdk1 Before 5, customize an enumeration class
1. The constructor needs to be privatized to ensure that the number of objects of the class is limited
2. To create a member variable, you must define it as a constant
3. Provide public static member variables to the outside world to obtain the objects of enumeration classes
4. Only public get methods are provided
5. Override toString() method
public class EnumDemo1 { public static void main(String[] args) { Season spring = Season.SPRING; System.out.println(spring); System.out.println(spring.getSEASON_NAME()); System.out.println(spring.getSEASON_DESC()); } } class Season{ //2. To create a member variable of Seanson, you must define it as a constant private final String SEASON_NAME; private final String SEASON_DESC; //1. The construction method needs to be privatized to ensure that the number of objects of the class is limited private Season(String SEASON_NAME,String SEASON_DESC){ this.SEASON_NAME = SEASON_NAME; this.SEASON_DESC = SEASON_DESC; } //3. Provide public static member variables to the outside world to obtain the objects of enumeration classes public static final Season SPRING = new Season("spring","in the warm spring , flowers are coming out with a rush"); public static final Season SUMMER = new Season("summer","Scorching sun"); public static final Season AUTUMN = new Season("autumn","fresh autumn weather"); public static final Season WINTER = new Season("winter","snow gleams white"); //4. Only public get methods are provided public String getSEASON_NAME() { return SEASON_NAME; } public String getSEASON_DESC() { return SEASON_DESC; } //5. Override toString() method @Override public String toString() { return "Season{" + "SEASON_NAME='" + SEASON_NAME + '\'' + ", SEASON_DESC='" + SEASON_DESC + '\'' + '}'; } }
2. At jdk1 After 5, java provides a keyword called enum to create enumeration classes
1. The constructor needs to be privatized to ensure that the number of objects of the class is limited
2. To create a member variable, you must define it as a constant
3. Enumeration has a limited number of objects, which are connected by commas and end with the last semicolon. Enumerations are placed in the header
4. Only public get methods are provided
public class EnumDemo2 { public static void main(String[] args) { Season2 spring = Season2.SPRING; System.out.println(spring); System.out.println(Season2.class.getSuperclass());//Enum has a parent class } } enum Season2{ //3. Enumeration has a limited number of objects, which are connected by commas and end with the last semicolon //Enumerations are placed in the header SPRING("spring", "Recovery of all things"), SUMMER("summer", "Scorching sun"), AUTUMN("autumn", "fresh autumn weather"), WINTER("winter", "snow gleams white"); //2. Create the properties of Season2 and handle constants private final String SEASON_NAME; private final String SEASON_DESC; //1. To ensure that the number of objects of the class is limited //Then we must have a private constructor private Season2(String SEASON_NAME,String SEASON_DESC){ this.SEASON_NAME = SEASON_NAME; this.SEASON_DESC = SEASON_DESC; } //4. Provide SEASON_NAME and session_ get method of desc public String getSEASON_NAME() { return SEASON_NAME; } public String getSEASON_DESC() { return SEASON_DESC; } }
Enumeration classes can implement interfaces
1. Implement the abstract method in the interface directly in the enumeration class
2. Implemented in each enumerated object