Java learning notes

Posted by bubatalazi on Thu, 20 Jan 2022 18:18:35 +0100

Learning content from Station B: Mr. Han Shunping's Java basic course

regular expression

You can quickly and easily match the content of the string, and the matching content can be specified through the pattern of unique rules

  • A regular expression is a formula that uses a pattern to match a string
  • Although it looks strange, it's not complicated to learn
  • Learning can greatly shorten the time-consuming of text processing

For example, find the substring of all four numbers connected together from the given text, and among the four numbers, the first is the same as the fourth, and the second is the same as the third, such as 12213443

If you use the traditional method, you can only traverse the string, and record the number of consecutive numbers at the same time, and then make corresponding judgment, which is more troublesome

Using regular expressions, you can quickly match the desired content by specifying pattern

In other words, you can verify the format of email and mobile phone number, or you can specify pattern to judge quickly

Bottom implementation analysis

For example, find the substring of all four numbers in a string of text:

public static void main(String[] args) {
        String content = "1998 The second generation, December 8 Java Enterprise version of the platform J2EE release. In June 1999, Sun Company release" +
                "The second generation Java Platform (referred to as Java2)3 versions of: J2ME(Java2 Micro Edition,Java2 flat" +
                "Micro version of the station), which is applied to mobile, wireless and limited resource environments; J2SE(Java 2 Standard Edition," +
                "Java 2 The standard version of the platform), which is applied to the desktop environment; J2EE(Java 2Enterprise Edition,Java 2" +
                "Platform based Enterprise Edition), applied to Java Application server. Java 2 The release of the platform is Java Most important in the development process" +
                "A milestone that marks Java The application of began to popularize.";
        // Match all four numbers
        // 1. \d represents an arbitrary number
        String regStr = "\\d\\d\\d\\d";
        // 2. Create a Pattern object
        Pattern pattern = Pattern.compile(regStr);
        // 3. Create matcher
        // Description: create a matcher to match the content string according to the rules specified in regStr
        Matcher matcher = pattern.matcher(content);
        // 4. Start matching
        // If found, it returns true; otherwise, it returns false
        // Put the matched content into matcher group(0)
        while (matcher.find()) {
            System.out.println("Found:" + matcher.group(0));
        }
    }

Where matcher Find() completes the following tasks:

  1. Locate the string that meets the requirements according to the specified rules (such as 1998)
  2. Once found, record the substring index in the attribute int[] groups of the matcher object (such as 1998). Start indexing records to groups[0], that is, groups[0] = 0; After index + 1 is completed, it is recorded in groups[1], that is, groups[1] = 4
  3. At the same time, record the value of oldLast as the value of groups[1], which is used as the matching start position of the next execution of the find() method

matcher.group(0) analysis:
Source code:

public String group(int group) {
        if (first < 0)
            throw new IllegalStateException("No match found");
        if (group < 0 || group > groupCount())
            throw new IndexOutOfBoundsException("No group " + group);
        if ((groups[group*2] == -1) || (groups[group*2+1] == -1))
            return null;
        return getSubSequence(groups[group * 2], groups[group * 2 + 1]).toString();
    }

The above can be summarized as returning substrings between [groups[0], groups[1]), similar to the subString method

So why group(0) and what does this 0 mean?

Modify the above example slightly and add two pairs of parentheses in the pattern as follows:

This is equivalent to grouping regular expressions into groups as many pairs of parentheses as there are. Now use matcher The tasks completed by the find () method are:

  1. Locate the string that meets the requirements according to the specified rules (such as 1998)
  2. Once found, record the substring index in the attribute int[] groups of the matcher object
  3. For example, in 1998, the index records to groups[0], that is, groups[0] = 0; After index + 1 is completed, it is recorded in groups[1], that is, groups[1] = 4
  4. Considering grouping, for substring 1998, record the string 19 matched by group 1 (), groups [2] = 0, groups[3] = 2
  5. Considering grouping, for substring 1998, record the string 98 matched by group 2 (), groups [4] = 2, groups[5] = 4
  6. If there are more groups, and so on
  7. At the same time, record the value of oldLast as the value of groups[1], which is used as the matching start position of the next execution of the find() method

That is, after grouping, the indexes of 0 and 1 in groups are still the beginning and end indexes of the matched substring, and the indexes corresponding to the grouping are recorded in the subsequent positions

For example:

public static void main(String[] args) {
        String content = "1998 December 8, 2008, second generation Java Enterprise version of the platform J2EE release. In June 1999, Sun Company release" +
                "The second generation Java Platform (referred to as Java2)3 versions of: J2ME(Java2 Micro Edition,Java2 flat" +
                "Micro version of the station), which is applied to mobile, wireless and limited resource environments; J2SE(Java 2 Standard Edition," +
                "Java 2 The standard version of the platform), which is applied to the desktop environment; J2EE(Java 2Enterprise Edition,Java 2" +
                "Platform based Enterprise Edition), applied to Java Application server. Java 2 The release of the platform is Java Most important in the development process" +
                "A milestone that marks Java The application of began to popularize.";
        // Match all four numbers
        // 1. \d represents an arbitrary number
        String regStr = "(\\d\\d)(\\d\\d)";
        // 2. Create a Pattern object
        Pattern pattern = Pattern.compile(regStr);
        // 3. Create matcher
        // Description: create a matcher to match the content string according to the rules specified in regStr
        Matcher matcher = pattern.matcher(content);
        // 4. Start matching
        // If found, it returns true; otherwise, it returns false
        // Put the matched content into matcher group(0)
        /**
         * matcher.find() Tasks completed:
         * 1. Locate the string that meets the requirements according to the specified rules (such as 1998)
         * 2. Once found, record the substring index in the attribute int[] groups of the matcher object
         *      Start indexing records to groups[0], i.e. groups[0] = 0
         *      After index + 1 is completed, it is recorded in groups[1], that is, groups[1] = 4
         *    (Consider grouping) 2.1 groups[0] = 0, groups[1] = 4
         *    (Consider grouping) 2.2 for substring 1998, record the string 19 matching group 1 (), groups [2] = 0, groups[3] = 2
         *    (Consider grouping) 2.3 for substring 1998, record the string 98 matched by group 2 (), groups [4] = 2, groups[5] = 4
         *    (Consider grouping) if there are more groups, and so on
         * 3. At the same time, record the value of oldLast as the value of groups[1], which is used as the matching start position of the next execution of the find() method
         *
         * matcher.group(0) analysis:
         *
         * public String group(int group) {
         *         if (first < 0)
         *             throw new IllegalStateException("No match found");
         *         if (group < 0 || group > groupCount())
         *             throw new IndexOutOfBoundsException("No group " + group);
         *         if ((groups[group*2] == -1) || (groups[group*2+1] == -1))
         *             return null;
         *         return getSubSequence(groups[group * 2], groups[group * 2 + 1]).toString();
         *     }
         *
         * 1. The above method can be summarized as returning substrings between [groups[0], groups[1]), similar to the subString method
         *
         */
        while (matcher.find()) {
            System.out.println("Found:" + matcher.group(0));
            System.out.println("Group 1 () Matched value:" + matcher.group(1));
            System.out.println("Group 2 () Matched value:" + matcher.group(2));
        }
    }

Output results:

Found: 1998
 Group 1 () Matched value: 19
 Group 2 () Matched value: 98
 Found: 1999
 Group 1 () Matched value: 19
 Group 2 () Matched value: 99

regular expression syntax

Regular expressions are composed of various metacharacters, which can be roughly divided into:

  • qualifier
  • Select Match
  • Grouping, combining, and backreferencing
  • Special characters
  • Character matching character
  • Locator

Escape character

First, you need to know that the escape character is\

  • When you need to use regular expressions to retrieve some special characters, you need to use transfer symbols, otherwise the retrieval results will not be found, or even an error will be reported

It's like using (to match ABC ($ABC) (123) to report an error

Add \ before ()

Note: in Java regular expressions, two \ denotes one of the other languages\

Characters requiring escape characters are:

  • . * + ( ) $ / \ ? [ ] ^ { }

Character matching character



among

  • \d{3} is equivalent to \ \ d\d\d

In addition, there are:

  • \s matches any white space character (space, tab)
  • \S matches any non white space character, as opposed to the previous one
  • [abcd] means to match any character in the abcd
  • [^ abcd] means to match any character that is not abcd

Java regular expressions are case sensitive by default. How to realize case insensitive?

  • (? i)abc means abc is not case sensitive
  • a(?i)bc indicates that bc is not case sensitive
  • a((?i)b)c means that only B is case insensitive
  • You can also add the parameter pattern. In the compile method of pattern CASE_ Insensive, such as pattern pattern = pattern compile(regEx, Pattern.CASE_INSENSIVE);

Select Match

When matching a character, it can be optional. It can match both this and that. Popular understanding is the or operation in logical expression, and the symbol is the same as or|

qualifier

Used to specify how many consecutive occurrences of the preceding character and combination item occur. For example, the preceding \ \ d{3} is equivalent to \ \ d\d\d


be careful:

  • Java matching is greedy matching. If a {3, 4} is specified and the text to be matched is aaaa456, four A's, that is, AAAA, will be matched instead of three a's
  • Similarly, if you specify a {3, 4} and the text to be matched is aa789, you will first match 4 A, that is, aaaa, and then match 3 A, that is, aaa; If the text to be matched is aa678, only four A's will be matched, that is, aaaa
  • +Indicates that the matching occurs more than or equal to 1 time. If it is specified as 1 +, and the text to be matched is 1111456, it will be directly matched to 1111
  • *And? Similarly

Locator

grouping


example:

public static void main(String[] args) {
        String content = "jieruigou NN GGG1237gou 9987gou";
        String regStr = "(?<g1>\\d\\d)(?<g2>\\d\\d)";

        Pattern pattern = Pattern.compile(regStr);
        Matcher matcher = pattern.matcher(content);

        while (matcher.find()) {
            System.out.println("Found:" + matcher.group(0));
            System.out.println("Group 1 content:" + matcher.group(1));
            System.out.println("Group 1 content(By group name): " + matcher.group("g1"));
            System.out.println("Group 2:" + matcher.group(2));
            System.out.println("2nd group content(By group name): " + matcher.group("g2"));
        }
    }

Output results:

Found: 1237
 Group 1 content: 12
 Group 1 content(By group name): 12
 Content of the second group: 37
 2nd group content(By group name): 37
 Found: 9987
 Group 1 content: 99
 Group 1 content(By group name): 99
 Group 2 content: 87
 2nd group content(By group name): 87

Non capture packet


Example 1:
Use of (?: pattern)

public static void main(String[] args) {
        String content = "hi Jerry dog jerry Jerry dog Captain Jerry hello";

        // The following two expressions are equivalent
//        String regStr = "Jerry dog | Jerry dog | captain Jerry";
        String regStr = "Jerry(?:dog|dog|captain)";


        Pattern pattern = Pattern.compile(regStr);
        Matcher matcher = pattern.matcher(content);

        while (matcher.find()) {
            System.out.println("Found:" + matcher.group(0));
        }
    }

Output results

Find: Jerry dog
 Found: Jerry dog
 Found: Captain Jerry

Example 2:
Use of (? = pattern)

public static void main(String[] args) {
        String content = "hi Jerry dog jerry Jerry dog Captain Jerry hello";

        // The following two expressions are equivalent
//        String regStr = "Jerry dog | Jerry dog | captain Jerry";
//        String regStr = "Jerry (: dog captain)";
        String regStr = "Jerry(?=dog|captain)";


        Pattern pattern = Pattern.compile(regStr);
        Matcher matcher = pattern.matcher(content);

        while (matcher.find()) {
            System.out.println("Found:" + matcher.group(0));
        }
    }

Output results:

Found: Jerry
 Found: Jerry

(?! pattern) is equivalent to the reverse effect of (? = pattern), and (? = pattern) cannot match (?! pattern)

be careful:

  • Matcher cannot be used for non capture packets group(1)

Non greedy matching

As mentioned earlier, Java defaults to greedy matching, and non greedy matching can be achieved by adding? To achieve

Application examples

Chinese character verification

 public void isCharacter() {
        String content = "Jerry dog";
        // Coding range of Chinese characters
        String regStr = "^[\u0391-\uffe5]+$";


        Pattern pattern = Pattern.compile(regStr);
        Matcher matcher = pattern.matcher(content);

        if (matcher.find()) {
            System.out.println("Satisfy format");
        } else {
            System.out.println("Format not satisfied");
        }
    }

Zip code verification (incomplete)

// Requirement: six digits
    public void isMailCode() {
        String content = "411320";
        // Coding range of Chinese characters
        String regStr = "^[1-9]\\d{5}$";


        Pattern pattern = Pattern.compile(regStr);
        Matcher matcher = pattern.matcher(content);

        if (matcher.find()) {
            System.out.println("Satisfy format");
        } else {
            System.out.println("Format not satisfied");
        }
    }

QQ number verification

// Requirement: number starting from 1 - 9 (5-10 digits)
    public void isQQId() {
        String content = "411320";
        // Coding range of Chinese characters
        String regStr = "^[1-9]\\d{4,9}$";


        Pattern pattern = Pattern.compile(regStr);
        Matcher matcher = pattern.matcher(content);

        if (matcher.find()) {
            System.out.println("Satisfy format");
        } else {
            System.out.println("Format not satisfied");
        }
    }

Mobile number verification

// Requirements: 1 starts, the second digit is one of 3 4 5 8, a total of 11 digits
    public void isPhoneNumber() {
        String content = "15966667777";
        // Coding range of Chinese characters
        String regStr = "^1[3|4|5|8]\\d{9}$";


        Pattern pattern = Pattern.compile(regStr);
        Matcher matcher = pattern.matcher(content);

        if (matcher.find()) {
            System.out.println("Satisfy format");
        } else {
            System.out.println("Format not satisfied");
        }
    }

URL validation

public void isURL() {
        String content = "https://www.bilibili.com/video/BV1fh411y7R8?p=894&spm_id_from=pageDriver";

        /**
         * Analysis ideas
         * 1. It may start with https: / / or http://
         * 2. The domain name consists of numbers, letters, underscores and -
         * 3. The following path starts with \, followed by letters, numbers, and some characters
         */
        String regStr = "^((http|https)://)? ([\w-]+\.)+ [\w-]+(\/[\w-?=&/.%#]*)?$ "; / / [.? *] the characters in brackets represent the matching characters themselves

        Pattern pattern = Pattern.compile(regStr);
        Matcher matcher = pattern.matcher(content);

        if (matcher.find()) {
            System.out.println("Satisfy format");
        } else {
            System.out.println("Format not satisfied");
        }
    }

Common class

Pattern

The pattern object is a regular expression object without a common constructor

If you want to create a Pattern object, you need to call its static method compile() to return the Pattern object

This method needs to accept a regular expression as its first parameter, such as:

Pattern pattern = Pattern.compile("^1[3|4|5|8]\\d{9}$");

The Pattern class also has other methods, such as

  1. matches, used to verify whether the input string meets the given requirements
public void testMatches() {
        String content = "hello jerry hello, gougougou";
        String regStr = "hello";

        // If the regular expression can match the given text as a whole, it returns true; otherwise, it returns false
        boolean matches = Pattern.matches(regStr, content);
        System.out.println(matches ? "Overall matching successful" : "Overall matching failed");
    }

Just like the above application examples, you can actually use the matches method

In fact, the bottom layer of this method is still the matches method calling the Matcher class

Matcher

The Matcher object is an engine that interprets and matches the input string. Like the Pattern class, the Matcher does not have a public constructor. You need to call the Matcher method of the Pattern object to obtain the Matcher object

Matcher matcher = pattern.matcher(content);

The common methods of Matcher class are:

Example 1:

public void testMethod() {
        String content = "hello jerry dog jack jessi hello jim hello";
        String regStr = "hello";

        Pattern pattern = Pattern.compile(regStr);
        Matcher matcher = pattern.matcher(content);
        while (matcher.find()) {
            System.out.println("=========");
            // The start index of the currently matched substring, which is equivalent to groups[0]
            System.out.println(matcher.start());
            // The ending index of the currently matched substring + 1, which is equivalent to groups[1]
            System.out.println(matcher.end());
            System.out.println("Found:" + content.substring(matcher.start(), matcher.end()));
        }
        // The overall matching method verifies whether a string meets a rule
        System.out.println("Overall matching=" + matcher.matches());
    }

Output results:

=========
0
5
 Found: hello
=========
27
32
 Found: hello
=========
37
42
 Found: hello
 Overall matching=false

Example 2:
Please replace jerry in the string "hello jerry dog jack jessi hello jim hello"

public void testExchange() {
        String content = "hello jerry dog jack jessi hello jerry jim hello";
        String regStr = "jerry";

        Pattern pattern = Pattern.compile(regStr);
        Matcher matcher = pattern.matcher(content);

        // The replaceAll method returns the replaced new string without modifying the original string
        String newContent = matcher.replaceAll("Jerry dog");
        System.out.println("newContent = " + newContent);
    }

Output results:

newContent = hello Jerry dog dog jack jessi hello Jerry dog jim hello

PatternSyntaxException

PatternSyntaxException is a non mandatory exception class used to represent syntax errors in regular expressions

Back reference

Go back to the question mentioned at the beginning:

  • Find the substring of all four numbers connected together from the given text, and among the four numbers, the first is the same as the fourth, and the second is the same as the third, such as 12213443

It can be found that the previously described content cannot complete this function, so a new method - back reference needs to be introduced

First, we need to clarify three concepts

  • grouping:
    Regular expressions can be wrapped in parentheses, and each wrapped content can be regarded as a group
  • Capture:
    Save the contents of regular expression sub expression / grouping matching to memory. Grouping is distinguished by numeric number by default, and can also be displayed and named. By default, grouping 0 is the matching result of the entire regular expression, and then the grouping is divided into 1, 2, and so on from left to right
  • Back reference:
    After the contents of the parentheses are captured, they can be used after the parentheses (on the right), so as to write more complex regular expressions, which is called back reference. Backreferences can be referenced either internally in regular expressions (via \ \ group number references) or externally (via $group number references)

Then you can now reverse reference the questions raised above:

public static void main(String[] args) {
        String content = "jerry dog3443 dog1234 jerry 1221 hello";
        String regStr = "(\\d)(\\d)\\2\\1";

        Pattern pattern = Pattern.compile(regStr);
        Matcher matcher = pattern.matcher(content);
        while (matcher.find()) {
            System.out.println("find=" + matcher.group(0));
        }
    }

Another example:
Search the number in the string in the form of 12321-333444111, that is, it starts with five digits, followed by one -, followed by nine digits. It is required that every three digits should be the same

public void testNum() {
        String content = "jerry12321-444555999 dog3443 dog1234 jerry 1221 hello";
        String regStr = "\\d{5}-(\\d)\\1{2}(\\d)\\2{2}(\\d)\\3{2}";

        Pattern pattern = Pattern.compile(regStr);
        Matcher matcher = pattern.matcher(content);
        while (matcher.find()) {
            System.out.println("find=" + matcher.group(0));
        }
    }

Stuttering and de duplication cases

Put something like: "I... I want to... Learn... java programming! “
Modify the regular expression to "I want to learn java programming! “

public static void main(String[] args) {
        String content = "I...i want...Learn to learn...java programming!";

        // Remove all
        Pattern pattern = Pattern.compile("\\.");
        Matcher matcher = pattern.matcher(content);
        content = matcher.replaceAll("");

        System.out.println(content);

        // Remove duplicate words, method 1
        // First use (.)\ 1 + match consecutive identical words
        pattern = Pattern.compile("(.)\\1+");
        matcher = pattern.matcher(content);
        while (matcher.find()) {
            System.out.println("find=" + matcher.group(0));
        }
        // Replace the matching content with the back reference $1
        String newContent = matcher.replaceAll("$1");
        System.out.println("newContent = " + newContent);

        // Remove duplicate words, method 2
        content = Pattern.compile("(.)\\1+").matcher(content).replaceAll("$1");
        System.out.println("content = " + content);
    }

Output results:

I want to learn java programming!
find=I, I
 find=Learn to learn
newContent = I want to learn java programming!
content = I want to learn java programming!

Using regular expressions in String classes

replace

The replaceAll(String regex, String replacement) method of String class can be replaced directly with regular expressions

example:

public static void main(String[] args) {
        // Jdk1. In the following text 3,JDK1.4 replace with JDK
        String content = "2000 In May, JDK1.3,JDK1.4 and J2SE1.3 It was released one after another, and a few weeks later it won" +
                "Apple company Mac OS X Support for industry standards. On 24 September 2001, J2EE1.3 release. two thousand and two" +
                "On February 26, J2SE1.4 release. since then Java The computing power of has been greatly improved, and J2SE1.3 comparison," +
                "There are nearly 62 more%Classes and interfaces. Among these new features, a wide range of XML Support, secure sockets" +
                "(Socket)Support (via SSL And TLS Agreement), brand new I/OAPI,Regular expressions, logs, and assertions" +
                ". 2004 On September 30, J2SE1.5 Publish, become Java Another milestone in the history of language development. To show" +
                "The importance of this version, J2SE 1.5 Renamed Java SE 5.0(Build number 1.5.0),Code name“ Ti" +
                "ger",Tiger Contains 1 published since 1996.0 The most significant updates since the release, including generic support" +
                "Automatic boxing of basic types, improved loops, enumeration types, formatting I/O And variable parameters.";
        // Methods using the String class
        content = content.replaceAll("JDK1\\.3|JDK1\\.4", "JDK");
        System.out.println(content);
    }

judge

matches(String regex) method of String class

example:
Judge whether the given mobile phone number starts with 138 / 139

public void testMatches() {
        String content = "13866666666";
        if (content.matches("13(8|9)\\d{8}")) {
            System.out.println("Meet the requirements!");
        } else {
            System.out.println("Does not meet the requirements!");
        }
    }

division

split(String regex) method of String class

example:
Split the string by - + #~

public void testSplit() {
        String content = "I+yes-one~individual#Word character + string ";
        String[] split = content.split("~|\\+|-|#");
        for (String s : split) {
            System.out.println("s = " + s);
        }
    }

Topics: Java