In depth understanding of String

Posted by firecircle on Thu, 17 Feb 2022 17:12:36 +0100

preface

***
reference material: Deep understanding of String in Java


1, Analysis of the underlying source code of String class

Let's take a look at the source code of the String class:

public final class String implements java.io.Serializable, Comparable<String>, CharSequence {
    /** The value is used for character storage. */
    private final char value[];

    /** Cache the hash code for the string */
    private int hash; // Default to 0

    /** use serialVersionUID from JDK 1.0.2 for interoperability */
    private static final long serialVersionUID = -6849794470754667710L;
	........
}

The following points can be seen from the above:
1. The string class is the final class, which represents the immutable character sequence.

2. The string class implements serializable, comparable and charsequence interfaces.

  • Realize the Serializable interface for the serialization of strings (indicating that strings can be transmitted on the network);
  • Implement the Comparable interface for string comparison;
  • The implementation of CharSequence interface is mainly to provide some necessary methods or specifications (personal understanding: the CharSequence interface here is only a specification, because this interface is implemented in StringBuild and Stringbuffer, so these classes can override the methods of CharSequence interface).

    ps: this picture is the method implemented by the CharSequence interface. These methods are rewritten in String, StringBuild and Stringbuffer.


3. The member value [] array in the string class is used to store strings, which also explains why the method charAt can be used to intercept any character in a string. Note: value is a final type and cannot be modified: that is, value cannot point to a new address, but the content of a single character can be changed. This is a little difficult to understand. Let's illustrate it with an example:
See the code below:

public class TestString {
    public static void main(String[] args) {
        final char[] value = {'h','e','l','l','o'};
        //1. Modify the content of value and change 'H' to 'H'
        value[0] = 'H';
        System.out.println(value);  //The output result is: Hello
        //2. Try to give the address value of ch to value
        char[] ch = {'H','E','L','L','O'};
        value = ch;     //If the compilation fails, the address value cannot be modified
    }
}

Use this figure to illustrate the above code. The address value is randomly assumed


2, Constant pool

  • Reference: we know that String allocation, like other object allocation, consumes high time and space, and we use a lot of strings. In order to improve performance and reduce memory overhead, the JVM makes some optimizations when instantiating strings: using String constant pool. Whenever we create a String constant, the JVM will first check the String constant pool. If the String already exists in the constant pool, it will directly return the instance reference in the constant pool. If the String does not exist in the constant pool, the String is instantiated and placed in the constant pool. Due to the immutability of String string, we can be quite sure that there must not be two identical strings in the constant pool (this is very important for understanding the above).
    Constant pools in Java are actually divided into two forms: static constant pools and runtime constant pools.
    1. Static constant pool: i.e. * The constant pool in the class file. The constant pool in the class file contains not only string (numeric) variables, but also class and method information, occupying most of the space of the class file.
    2. Runtime constant pool: after the jvm virtual machine completes the class loading operation, it loads the constant pool in the class file into memory and saves it in the method area. The constant pool we often say refers to the runtime constant pool in the method area.


3, Creation of String object

After understanding the concept of constant pool, you can more clearly know the creation process of string. There are two ways to create string.

1. Creation method 1:

Code of creation method 1:

String s = "hello,world";

This way is to find out whether the string "hello,world" already exists in the constant pool. If so, let s directly point to the string "hello,world". If not, create and point to the string.


Analyze the following codes according to the above instructions:

public class TestOfString {
    public static void main(String[] args) {
        String s1 = "hello";
        s1 = "world";
    }
}

The analysis diagram of the above code is as follows:

Note: the blue arrow in the figure is the first statement. Because there is no string "hello" in the constant pool, the string "hello" is created in the constant pool and s points to the string "hello". The execution process of the second statement is red arrow and red fork. The second statement creates "world" in the constant pool and points to the string (that is, the reference content of S is modified, and "hello" still exists in the constant pool), rather than modifying the original "hello" content. Here, it is easier to understand s as the pointer of c language. It is equivalent to modifying the pointer, not the content.


2. Creation method 2:

Code of creation method 2:

String s = new String("hello,world");

The creation process of this method is different from that of the first method. The creation process is as follows:

This method is to first create a String object in the heap, then the value array in the String object, and then check whether there is a String "hello,world" in the constant pool. If there is, it will directly point to "hello,world" through value. If not, it will be created in the constant pool and then point to "hello,world".


4, Some methods of String class

1.replace(), replaceAll() and replaceFirst() methods:

Note: the replace(), replaceAll() and replaceFirst() methods do not modify the value of the original object, but are implemented by creating a new object.
Let's first look at the underlying source code of the replace() method:

    public String replace(char oldChar, char newChar) {
        if (oldChar != newChar) {
            int len = value.length;
            int i = -1;
            char[] val = value; /* avoid getfield opcode */

            while (++i < len) {
                if (val[i] == oldChar) {
                    break;
                }
            }
            if (i < len) {
                char buf[] = new char[len];
                for (int j = 0; j < i; j++) {
                    buf[j] = val[j];
                }
                while (i < len) {
                    char c = val[i];
                    buf[i] = (c == oldChar) ? newChar : c;
                    i++;
                }
                return new String(buf, true);
            }
        }
        return this;
    }

It can be seen from the source code that the bottom layer of the replace() method does not modify the array value, but recreates a string object and returns it. You can see the code return new String(buf, true), which also verifies the above statement.


Take another look at the replaceAll() method:

    public String replaceAll(String replacement) {
        reset();
        boolean result = find();
        if (result) {
            StringBuffer sb = new StringBuffer();
            do {
                appendReplacement(sb, replacement);
                result = find();
            } while (result);
            appendTail(sb);
            return sb.toString();
        }
        return text.toString();
    }

It can be found from the source code that the toString() method is called. In this toString() method, the same new String is returned, and the value value is still not modified. The replaceFirst() method is similar, so I won't repeat it.


2.String.intern() method:

The view api says this:

When calling the intern method, if the pool already contains a String equivalent to this String object determined by the equals(Object) method, the String from the pool is returned. Otherwise, this String object will be added to the pool and a reference to this String object will be returned.
Thus, for any two strings S and T, s.intern() == t.intern() is true if and only if s.equals(t) is true.

  • Personal understanding: this paragraph means that if an object calls this method, it will first check whether there is this string in the constant pool. If so, it will return the address value of this string. (otherwise, I don't understand this part. I'll explain it later)

String.intern() method exercise:

public class TestString {
    public static void main(String[] args) {
        String a= "hello";
        String b= "world";
        String c = a+ b;
        System.out.println(c == "helloworld");    //Output: false
        System.out.println(c.intern()=="helloworld"); //Output: true
    }
}

Draw a picture to illustrate:

  • Note: C is the object in the heap. Because the address value of object d in the heap is different from that of "helloworld", the first statement outputs false. c.intern() returns the address value of the constant pool (that is, the address of "helloworld"), so the second statement outputs true.

Other methods will be added later when you have time

Topics: Java Back-end JavaSE