Replacement and decomposition of character sequences in Java

Posted by sobbayi on Sat, 19 Feb 2022 20:49:06 +0100

1, Using the String class

The String object calls the public String replaceAll(String regex,String replacement) method to return a new String object. The character sequence of the returned String object is the character sequence obtained by replacing all sub character sequences matching the parameter regex in the character sequence of the current String object with the character sequence specified by the parameter replacement.
For example:

String s1="123hello456";
String s2=s1.replaceAll("\\d+","Hello."); //"\ \ d +" is a regular expression that represents one or more arbitrary numbers between 0 and 9
System.out.println(s1);//The print result is: 123hello456 has not been changed
System.out.println(s2);//The print result is: hello. hello.

Another example:

String regex="-?[0-9][0-9]*[.]?[0-9]*";
String s1="999 hello everyone,-123.459804 It's a holiday tomorrow";
String s2=s1.replaceAll(regex,"");
System.out.println("eliminate"+s1+"The character sequence obtained after the number in is:"+s2);
//The character sequence after excluding the numbers in 999, Hello everyone, - 123.459804, tomorrow's holiday is: Hello everyone, tomorrow's holiday

In fact, the String class provides a practical method:

public String[] split(String regex)

When the String object calls this method, the regular expression regex specified by the parameter is used as the separator to decompose the words in the character sequence of the String object, and the decomposed words are stored in the String array.
For example:

//Requirement: for a character sequence, it is necessary to decompose all words composed of numeric characters.
String s1="1931 On the evening of September 18, 2008, Japan launched the war of aggression against China. Please remember this day!";
String regex="\\D+";
String s2[]=s1.split(regex);
for(String s:s2)
System.out.println(s);//1931 09 08 are output respectively, and S2 is known length()=3;

It should be noted that the split method believes that the left and right of the separator are words. The additional rule is that if the word on the left is a character sequence without any characters, that is, it is empty, then the character sequence is still a word, but the word on the right must be a character sequence with characters.
For example:

String s1="February 18, 2022";
String regex="\\D+";
String s2[]=s1.split(regex);
System.out.println(s2.length());//Compile error: Method call expected 
for(String s:s2)
System.out.println(s);
//S2 [0] = S2 [1] = 2022 S2 [2] = 02 S2 [3] = 18 S1 [0] is an empty string and nothing will be displayed.
//Therefore, the length of s2 array should be 4 instead of 3. The extra empty string is "ad", and there is a word on the left by default. The content is empty.

2, Using the StringTokenizer class

1. Different from the split() method, the StringTokenizer object does not use regular expressions as separators.
2. When analyzing a character sequence and decomposing the character sequence into words that can be used independently, you can use Java The StringTokenizer class in util package says that the object of this class is a character sequence analyzer. This class has two construction methods.
Construction method 1: StringTokenizer(String s): construct a StringTokenizer object, such as fenxi. fenxi uses the default separator (space character, line feed character, carriage return character, Tab character, paper feed character (\ f)) to decompose the words in the character sequence of parameter s, that is, these words become the data in analysis.
Construction method 2: StringTokenizer(String s,String delim): construct a StringTokenizer object, such as fenxi. fenxi uses the arbitrary arrangement of characters in the character sequence of parameter delim as the separation mark to decompose the words in the character sequence of parameter s, that is, these words become the data in fenxi.
Note: any arrangement of separator marks is still separator marks.
3.fenxi can call the String nextToken() method to get the words in fenxi one by one. Whenever nextToken() returns a word, fenxi will automatically delete the word.
4.fenxi can call the boolean hasMoreTokens() method to return a boolean value. As long as there are words in fenxi, the method returns true, otherwise it returns false.
5.fenxi can call the countToken() method to return the number of words in the current fenxi.
Specific example 1:

String s="we are stud,ents";
StringTokenizer fenxi=new StringTokenizer(s," ,");//Use any combination of space and comma as a separator
int number=fenxi.countToken();
while(fenxi.hasMoreTokens()){
String str=fenxi.nextToken();
System.out.println(str);
System.out.println("Remaining"+fenxi.countToken()+"Words");
}
System.out.println("s Common words:"+number+"individual");
//Output result:
we
 There are three words left
are
 Two words left
stud
 One word left
ents
 0 words left
s Total words: 4

Specific example 2:

String s="Local call fee: 28.39 Yuan, long-distance call fee: 49.15 Yuan, Internet fee: 352 yuan";
String delim="[^0-9.]+";//Non numeric and All sequences match delim
s=s.replaceAll(delim,"#");
StringTokenizer fenxi=new StringTokenizer(s,"#");
double totalMoney=0;
while(fenxi.hasMoreTokens()){
double money=Double.parseDouble(fenxi.nextToken());
System.out.println(money);
totalMoney+=money;
}
System.out.println("Total cost:"+totalMoney+"element");
//Output result:
28.39
49.15
352.0
 Total cost: 429.53999999999996 element

3, Using the Scanner class

To create a Scanner object, you need to pass a String object to the constructed Scanner object. For example:

String s="telephone cost 876 dollar.Computer cost 2398.89 dollar.";

In order to parse the numeric words in the character sequence of s, a Scanner object can be constructed as follows:

Scanner scanner=new Scanner(s);

By default, scanner uses space as a separator to parse the words in the character sequence of s. You can also have the scanner object call methods:

useDelimiter(regular expression );

The regular expression is used as a separator, that is, when the Scanner object parses the character sequence of s, the character sequence matching the regular expression is used as a separator.
The characteristics of the character sequence parsed by the Scanner object are as follows:

  1. The scanner object calls the next() method to return the words in the character sequence of s in turn. If the last word has been returned by the next() method, the scanner object calls hasNext() to return false, otherwise it returns true.
  2. For numeric words in the character sequence of s, such as 12.34, scanner can call nextInt() or nextDouble() method instead of the next() method. That is, the scanner can call nextint () or nextdouble () methods to convert numeric words into int or double data return.
  3. If the word is not a numeric word, the scanner will call nextInt() or nextDouble() method, and an inputmismatch exception will occur. When handling the exception, you can call the next() method to return the non digitized word.
    Specific examples:
String cost="Local call fee: 28.39 Yuan, long-distance call fee: 49.15 Yuan, Internet fee: 352 yuan";
Scanner scanner=new Scanner(cost);
scanner.useDelimiter("[^0-9.]+");
double sum=0;
while(scanner.hasNext()){
try{
	double price=scanner.nextDouble();
	sum+=price;
	System.out.println(price);
	}catch(InputMismatchException e){
	String s=scanner.next();
	}
}
System.out.println("Total cost:"+sum+"element");
//Output result:
28.39
49.15
352.0
 Total cost: 429.53999999999996 element

contrast:
1. Both stringtokenizer class and Scanner class can be used to decompose words in character sequences, but they have different ideas.
2. The StringTokenizer class puts all the decomposed words into the entity of the StringTokenizer object. Therefore, the StringTokenizer object can quickly obtain words, that is, the entity of the StringTokenizer object occupies more memory (more memory and faster speed is equivalent to recording single words in the brain).
3. Different from the StringTokenizer class, the scanner class only stores the separation marks of how to obtain words, so the scanner object obtains words relatively slowly, but the scanner object saves memory space (slowing down and saving space is equivalent to putting words in the dictionary, and the brain only remembers the rules of looking up the Dictionary).

4, Use Pattern class and Matcher class

The steps of using Pattern class and Matcher class are as follows:
1. Use the regular expression regex as a parameter to get an instance of the pattern class called "pattern". for example

String regex="-?[0-9][0-9]*[.]?[0-9]*";
Pattern pattern=Pattern.compile(regex);

2. The pattern object pattern calls the matcher (charsequences) method to return a matcher object, which is called the matching object. The parameter s is the String object to be retrieved by the matcher.

Matcher matcher=pattern.matcher(s);

3. After these two steps, the matching object matcher can call various methods to retrieve s.
Specific methods include:
(1)public boolean find(): find the next subsequence in the character sequence of s that matches the regex. If successful, return true; otherwise, return false. When the matcher calls this method for the first time, it looks for the first sub sequence in s that matches the regex. If the find method returns true, when the matcher calls the find method again, it will start looking for the next sub character sequence that matches the regex after the last successful sub character sequence. In addition, when the find method returns true, the matcher can call the start() method and end() method to get the start position and end position of the sub character sequence in S. When the find method returns true, the matcher calls group() to return the sub character sequence matching the regex found by the find method this time.
(2)public boolean matches(): the matcher calls this method to determine whether the character sequence of s exactly matches the regex.
(3)public boolean lookingAt(): the matcher calls this method to determine whether there is a sub character sequence matching regex from the beginning of the character sequence of s.
(4)public boolean find(int start): the matcher calls this method to judge whether the character sequence of s starts from the position specified by the parameter start and whether there is a sub character sequence matched by regex. When start=0, this method has the same function as lookingAt().
(5)public String replaceAll(String replacement): the matcher can call this method to return a String object. The character sequence of this String object is obtained by replacing all the sub character sequences matching the pattern regex in the character sequence of s with the character sequence specified by the parameter replacement (note that s itself has not changed).
(6)public String replaceFirst(String replacement): the matcher calls this method to return a String object whose character sequence is obtained by replacing the first sub character sequence in the character sequence of s that matches the pattern regex with the character sequence specified by the parameter replacement (note that s itself does not change).
(7) public String group(): returns a String object whose character sequence is the sub character sequence matching regex found by the find method in the character sequence of s.
Specific examples:

String regex="-?[0-9][0-9]*[.]?[0-9]*";//Regular expressions that match numbers, integers, or floating-point numbers
Pattern pattern=Pattern.compile(regex);//Initialize mode object
String s="Local call fee: 28.39 Yuan, long-distance call fee: 49.15 Yuan, Internet fee: 352 yuan";
Matcher matcher=pattern.matcher(s);//Initializes the matching object for retrieving s
double sum=0;
while(matcher.find()){
String str=matcher.group();
sum+=Double.parseDouble(str);
System.out.println("from"+matcher.start()+"reach"+matcher.end()+"Matched subsequences:");
System.out.println(str);
}
System.out.println("Total cost:"+sum+"element");
String weatherForecast[]={"Beijing:-9 Degrees to 7 degrees","Guangzhou: 10 to 21 degrees","Harbin:-29 Du Zhi-7 degree"};//Temperature of three places of storage
double averTemperture[]=new double[weatherForecast.length];//Average temperature of three places of storage
for(int i=0;i<weatherForecast.length;i++){
Matcher matcher1=pattern.matcher(weatherForecast[i]);//Initialize the matching object, and the mode remains unchanged
double sum1=0;
int count=0;
while(matcher1.find()){
count++;//If there are several temperatures in a place, count should be added several times
sum1+=Double.parseDouble(matcher1.group()); //sum1 is the sum of the highest temperature and the lowest temperature in a place
}
averTemperture[i]=sum1/count;//for cycle once to calculate the average temperature of a place
}
System.out.println("Average temperature in the three places:"+Arrays.toString(averTemperture));
//The output is:
Subsequences matched from 4 to 9:
28.39
 Subsequences matched from 16 to 21:
49.15
 Subsequences matched from 27 to 30:
352
 Total cost: 429.53999999999996 element
 Average temperature in the three places:[-1.0, 15.5, -18.0]

Topics: Java regex