Java basics regular expressions

Posted by blacklotus on Tue, 25 Jan 2022 17:18:25 +0100

catalogue

1. Overview of regular expressions

2. Steps for using regular expressions:

3. Purpose of learning regular expressions:

4. Basic syntax of regular expressions:

(1) Literal characters:

(2) Metacharacter:

a. Character class:

b. Scope class:

c. Predefined classes:

d. Boundary class:

e. Quantifier:

f. Grouping:

g. back reference:

(3) Application of regular expressions in Java:

Regular expression classic exercise:

1. Overview of regular expressions

Use a single string to describe or match a series of strings that conform to certain syntax rules

2. Steps for using regular expressions:

(1) Find the rules through a large number of strings and get the definition rules

(2) Use this rule to match the new string

(3) . the matching is successful and the corresponding operation is made

3. Purpose of learning regular expressions:

Handle complex find / replace / match / split of strings through regular expressions

Regular expression is a java independent technology, not attached to java, but it can be used in java, python/js, etc

4. Basic syntax of regular expressions:

(1) Literal characters:

The character itself is a regular expression

Code example:

Export methods in String class: public String replaceAll(String regex,String replacement):

Used to replace all characters that meet the rules, and replace each substring of this string that matches the given regular expression with the given replacement.
 

public class RegularDemo2 {
    public static void main(String[] args) {
        String str = "ab123342asdasqwe&;123.";
        //One method in the String class is the replace function, which replaces all characters that meet the rules
        //public String replaceAll(String regex,String replacement)
        // Replace each substring of this string that matches the given regular expression with the given replacement.
        String regex = "\\.";
        System.out.println(str.replaceAll(regex,"_"));

        regex = "b";
        System.out.println(str.replaceAll(regex,"_"));
    }
}

Operation result:

(2) Metacharacter:

Common meta characters are shown in the table:

 

 

a. Character class:

Representation format: []

[]: indicates that characters are classified and can match any character appearing in brackets

Example: [123] means that any one of 1, 2 and 3 in the matched string will be matched

^: if it appears in square brackets, it means reverse. For example: [^ 123] means to match characters other than 1, 2 and 3

Code example:

public class RegularDemo3 {
    public static void main(String[] args) {
        String s = "ab123342asdasqwe&;123.";
        //Presentation format: []
        //[] refers to the classification of characters, which can match any character appearing in brackets
        //As long as there is any one of a, B and 2 in the matched string, it will be matched
        String regex = "[ab2]";
        System.out.println(s.replaceAll(regex,"_"));

        //Requirements: all but ab2 should be matched and replaced
        //^The presence of square brackets means to reverse and match characters that are not ab2
        regex = "[^ab2]";
        System.out.println(s.replaceAll(regex,"_"));
    }
}

Operation result:

b. Scope class:

In fact, it adds a range to the character class.

Code example:

public class RegularDemo4 {
    public static void main(String[] args) {
        String regex = "[ab]";
        String s = "abcdefghijklmnABCDTW1234DWFadqwr&;123=.";
        System.out.println("Before matching:" + s);
        System.out.println("=========================================");
        System.out.println(s.replaceAll(regex, "_"));

        //Requirement: matches all lowercase letters in the string
        //[a-z] indicates matching any lowercase letter from a to z
        regex = "[a-z]";
        System.out.println(s.replaceAll(regex, "_"));

        //[A-Z] indicates matching any capital letter from a to Z
        regex = "[A-Z]";
        System.out.println(s.replaceAll(regex, "_"));

        //Want to match both uppercase and lowercase?
//        regex = "[a-zA-Z]";
        regex = "[A-z]";
        System.out.println(s.replaceAll(regex, "_"));

        //What if you want to match the numbers now?
        regex = "[0-9]";
        System.out.println(s.replaceAll(regex, "_"));

        //You want to match numbers and uppercase and lowercase letters
        regex = "[0-z&.]";
        System.out.println(s.replaceAll(regex, "_"));
    }
}

Operation result:

Explanation of the two ways of matching uppercase and lowercase letters in the above code:

This matching principle is based on the ASCII code table. The difference between the second method and the first method is that the second method can match a wider range, because in the ASCII code table, the uppercase letters and lowercase letters are not continuous, and several characters in the middle can also be matched. Therefore, if these characters are encountered during matching, the matching will be successful.
 

c. Predefined classes:

In the previous case, in order to meet some requirements during the actual development of the use range class, such as judging whether it is a number, upper and lower case letters, the corresponding regular expression will be very long, and some characters mentioned above will match, which are not actually required in our requirements. Therefore, some expressions with special meanings are sorted out in the regular expression:

\d == [0-9] number

\D == [^0-9] non numeric

\s == [\r\n\f\r] white space character

\S == [^\r\n\f\r] white space character

\w == [a-zA-Z0-9]

\W == [^a-zA-Z0-9]

. = = represents any character

Code implementation:

public class RegularDemo5 {
    public static void main(String[] args) {
        String regex = "[0-9]";
        String s = "abcde fghijklmn ABCDTW12.....34D WFadq r&;1!!!!23=.";
        System.out.println("Before matching:" + s);
        System.out.println("=========================================");
        System.out.println(s.replaceAll(regex, "_"));

        regex = "\\d"; //[0-9] number
        System.out.println(s.replaceAll(regex, "_"));

        regex = "\\D"; //Indicates that all non numeric characters are matched
        System.out.println(s.replaceAll(regex, "_"));

        regex = "\\s"; //Match all white space characters
        System.out.println(s.replaceAll(regex, "_"));

        regex = "\\S"; //Matches all characters except white space
        System.out.println(s.replaceAll(regex, "_"));

        regex = "\\w"; //Match all uppercase and lowercase letters and numbers
        System.out.println(s.replaceAll(regex, "_"));

        regex = "\\W"; //Match all non uppercase and lowercase letters and numbers
        System.out.println(s.replaceAll(regex, "_"));

        regex = "."; // Represents matching any character
        System.out.println(s.replaceAll(regex, "_"));

        regex = "\\."; //Match This character
        System.out.println(s.replaceAll(regex, "_"));

    }
}

Operation result:

 

d. Boundary class:

Boundary characters mainly include:

^: does not appear in brackets, indicating that it starts with xxx

$: ends with xxx

\b: Word boundary

\B: Non word boundary

Code implementation:

public class RegularDemo6 {
    public static void main(String[] args) {
        //When there are no brackets, use ^, ^ indicates that it starts with xxx, and here it starts with ac
        String regex = "^abc";
        String s = "abcdefg";
        System.out.println("Before matching:" + s);
        System.out.println("=========================================");
        System.out.println(s.replaceAll(regex, "_"));

        regex = "fg$";
        System.out.println(s.replaceAll(regex, "_"));


        regex = "\\b";
        s = "hello worpd 888 1 2 & ; 0 a b c d";
        System.out.println("Before matching:" + s);
        System.out.println("===========================================");
        System.out.println(s.replaceAll(regex, "_"));

        regex = "\\B";
        System.out.println(s.replaceAll(regex, "_"));

    }
}

Operation result:

e. Quantifier:

Quantifiers mainly include:

? : 0 or 1 occurrences

+: one or more occurrences

*: any number of occurrences

{n} : exactly n times

{n,m}: n-m occurrences

{n, }; Indicates at least n occurrences

Verify QQ case:

Compare the general approach with the regular expression approach:

/*
        Requirement: verify whether the QQ number meets the requirements
        1,Must be 5-10 digits
        2,0 Cannot be used as the beginning of QQ number
        3,All must be numbers
 */
public class RegularDemo1 {
    public static void main(String[] args) {
        String s = "1165872335";
        //Write a method to verify whether this is legal qq
//        System.out.println(checkQQ(s));
        //Use regular processing to deal with such cases
        System.out.println(checkQQ2(s));
    }
 
    //Regular processing
    public static boolean checkQQ2(String qq){
        //Write a regular expression
        String regex = "[1-9][0-9]{4,9}";
        return qq.matches(regex);
    }
 
 
    /**
     *      Return value type: boolean
     *      Parameter list: String
     */
    public static boolean checkQQ(String qq){
        boolean flag = false;
        //1. Must be 5-10 digits
        if(qq.length() >= 5 && qq.length()<=10){
            //2. 0 cannot be the beginning of QQ number
            if(!qq.startsWith("0")){
                flag = true;
                //3. All must be numbers
                char[] chars = qq.toCharArray();
                for(int i=0;i<chars.length;i++){
//    public static boolean isDigit(char ch) determines whether the specified character is a number.
                    if(!Character.isDigit(chars[i])){
//                        return false;
                        flag = false;
                    }
                }
            }
        }
//        return true;
        return flag;
    }
}

Operation result: the two results are the same. Here is the regular result:

Examples of usage codes of quantifiers:

public class RegularDemo7 {
    public static void main(String[] args) {
        //Match 0 or 1 times starting with a
        String regex = "^a?";
        String s = "baaabcdefaaaaaag";
        System.out.println("Before matching:" + s);
        System.out.println("=======================================");
        System.out.println("a 0 or 1 occurrences:"+s.replaceAll(regex, "_"));
 
        regex = "^a+";
        System.out.println("a One or more occurrences:"+s.replaceAll(regex, "_"));
 
        regex = "^a*";
        System.out.println("a Occurs any number of times:"+s.replaceAll(regex, "_"));
 
        //{n} : exactly n times
        //Requirement: match a string a character for 6 consecutive occurrences
        regex = "a{6}"; // aaaaaa
        System.out.println("a 6 consecutive occurrences:"+s.replaceAll(regex, "*"));
 
        //{n,m}: n-m occurrences
        regex = "a{3,4}"; // The matching is that the number of consecutive occurrences of a is between 3-4
        System.out.println("a The number of occurrences is 3-4 between:"+s.replaceAll(regex, "*"));
 
        //{n, }; Indicates at least n occurrences
        regex = "a{6,}";
        System.out.println("At least n second:"+s.replaceAll(regex, "*"));
 
        //Verify qq
        regex = "[1-9][0-9]{4,9}";
        s = "1165872335";
        System.out.println("verification QQ:"+s.replaceAll(regex, "Match successful"));
    }
}

Operation result:

f. Grouping:

Grouping is to use () to group strings. It can also represent the regular representation enclosed by () as a whole

Code example:

public class RegularDemo8 {
    public static void main(String[] args) {
        //It means that the matching content is ab plus 1-2 c
        String reagex = "abc{1,2}";
        String s = "abcccccABC123123ABCabcccccABC123123ABCabcccccABC123123ABCabcabcabc123";
        System.out.println("Before matching:\n" + s);
        System.out.println("===========================================================");
        System.out.println(s.replaceAll(reagex, "_"));

        //Parentheses indicate grouping
        //Indicates that abc appears 1-2 times as a whole
        reagex = "(abc){1,2}";
        System.out.println(s.replaceAll(reagex, "_"));

        reagex = "ABC(abc){1,}";   //ABCabcabc
        System.out.println(s.replaceAll(reagex, "_"));

        //matches
        System.out.println(s.matches(reagex));
    }

}

Operation result:

g. back reference:

Mainly used for values:

$: value, take the value in the corresponding group number, and the number of each group starts from 1.

Demand: 2022-01-23 -- > 01 / 23 / 2022

Code example:

public class RegularDemo9 {
    public static void main(String[] args) {
        //2022-01-23
        String regex = "(\\d{4})-(\\d{2})-(\\d{2})";
        String s = "2022-01-23  2022-02-24";
        System.out.println(s.replaceAll(regex,"$2/$3/$1"));

        //In the group, if I don't want it to generate a number?:
        regex = "(\\d{4})-(?:\\d{2})-(\\d{2})";
//        System.out.println(s.replaceAll(regex,"$2/$3/$1"));
        System.out.println(s.replaceAll(regex,"$2/$1"));
    }
}

Operation result:

(3) Application of regular expressions in Java:

How to use regular expressions to implement related operations in java?

1. Search operation of string: Pattern and Matcher

2. String matching: you can use the matches method of the string

3. String replacement: there are replaceAll() method and replaceFirst() method in string class

4. String segmentation: there is a split() method in the string class

Code example:

public class RegularDemo10 {
    public static void main(String[] args) {
        String regex = "\\w{3,}";
        String s = "abcd123";
        System.out.println(s.matches(regex));

        regex = "[a-z]{2,}";
        s = "abc defg hello111";
        System.out.println(s.replaceAll(regex, "_"));
        System.out.println(s.replaceFirst(regex, "_"));

        s = "abc sbdf 123ab sa123bddss &";
        String[] s1 = s.split(" ");
        //Tool class traversal array
        System.out.println(Arrays.toString(s1));

        s = "abc sbdf 123ab sa123bddss &";
        String[] s2 = s.split("a");
        //Tool class traversal array
        System.out.println(Arrays.toString(s2));

        //Pattern and Matcher
        regex = "\\w{3,7}";
        Pattern compile = Pattern.compile(regex);
        Matcher matcher = compile.matcher("abcd123");
        System.out.println(matcher.matches());
    }
}

Operation result:

Regular expression classic exercise:

Requirement: change the string "I want to learn programming" into "I want to learn programming"

Analysis: 1. First To remove, use "\ \. +".

2. Take only one overlapping word: "(.)\ 1 + ", group one or more arbitrary characters one by one, and then back reference $1.

Code implementation:

public class RegularDemo11 {
    public static void main(String[] args) {
        String s = "I, I, I, I, I..........I.......Yes, yes, yes, yes..................Yes, yes, yes...Learn to learn.......Practice programming.......Cheng Cheng Cheng Cheng Cheng Cheng";
        //1. First Remove
        String regex = "\\.+";
        String s1 = s.replaceAll(regex, "");
        System.out.println(s1);

        //2. Merge overlapping words
        regex = "(.)\\1+";
        String s2 = s1.replaceAll(regex, "$1");
        System.out.println(s2);

    }
}

Operation result:

 

Topics: Java Front-end regex