Java Foundation Series 2: in depth understanding of String classes
String is one of the most commonly used data types in Java. It is also a basic knowledge point often asked in interviews. This article will talk about string in Java. It mainly includes the following five contents:
- String overview
- "+" connector resolution
- String constant pool
- String.intern() method parsing
- String, StringBuffer and StringBuilder
String overview
In Java, all literals similar to "ABCabc" are instances of string; The string class is located in Java Lang package is the core class of Java language, which provides string comparison, search, interception, case conversion and other operations; Java language provides special support for "+" connector and object conversion to string. String objects can use "+" to connect other objects. Some source codes of string are as follows:
public final class String implements java.io.Serializable, Comparable<String>, CharSequence { /** The value is used for character storage. */ private final char value[]; /** Cache the hash code for the string */ private int hash; // Default to 0 ... }
As can be seen from the above source code:
- The String class is modified by the final keyword, which means that the String class is immutable and cannot be inherited, and its member value is also final. Therefore, once the String is created, it cannot be modified;
- String class implements Serializable, CharSequence and Comparable interfaces;
- The value of String instance is stored through character array.
"+" connector resolution
Implementation principle of "+" connector
The Java language provides special support for the "+" connector and the conversion of objects to strings. String connection is realized by StringBuilder and its append method, Object conversion string is realized by toString method, which is realized by Object class and can be inherited by all classes in Java. Use a simple example to verify the implementation principle of "+" connector:
// Test code public class Test { public static void main(String[] args) { int i = 2; String str = "abc"; System.out.println(str + i); } } // After Decompilation public class Test { public static void main(String args[]) { byte byte0 = 10; String s = "abc"; System.out.println((new StringBuilder()).append(s).append(byte0).toString()); } }
By decompile the code, we can see that when Java uses "+" to connect string objects, JVM creates a StringBuilder object and calls its append method to connect the string. Finally, it calls the toString method of the StringBuilder object to return the spliced string. Therefore, in actual code writing, using "+" to splice strings is equivalent to using the append method of StringBuilder object to splice string objects.
Precautions for "+" connector
"+" efficiency
When using the "+" connector, the JVM will implicitly create a StringBuilder object. This method will not cause loss of efficiency in most cases, but it should be noted when splicing strings in a large number of loops. Because a large number of stringbuilders are created in heap memory, it will inevitably lead to the loss of efficiency. In this case, it is recommended to create a StringBuilder object outside the circulation and call the append method to splice manually.
Optimization of string constants
When compiling, it can be parsed into constant values. There is another special case. When both ends of "+" are string constants determined by the compiler, the compiler will optimize and directly splice the two strings. For example:
String s = "hello" + "world!"; // After Decompilation String s0 = "helloworld!";
/** * Compile time determination * For the final modified variable, it is parsed as a local copy of the constant value at compile time, stored in its own constant pool or embedded in its byte code stream. * Therefore, "a" + s1 "and" a "+" B "have the same effect at this time. So the result is true. */ String s0 = "ab"; final String s1 = "b"; String s2 = "a" + s1; System.out.println((s0 == s2)); // true
Cannot be resolved to a constant value at compile time
/** * Compilation time cannot be determined * Although s1 is decorated with final, because its assignment is returned through method call, its value can only be determined during operation * Therefore, s0 and s2 do not point to the same object, so the result of the above program is false. */ String s0 = "ab"; final String s1 = getS1(); String s2 = "a" + s1; System.out.println((s0 == s2)); // false public String getS1() { return "b"; }
To sum up, the "+" connector is very efficient for directly added string constants, because its value is determined during compilation, that is, it is shaped like "hello"+"java"; The addition of strings is optimized to "Ilovejava" during compilation. For indirect addition (i.e. including string reference, and the value cannot be determined at compile time), such as s1+s2+s3; Efficiency is lower than direct addition, because reference variables are not optimized in the compiler.
String constant pool
Introduction to string constant pool
For the eight basic types and string types in the Java language, the JVM provides them with a concept of constant pool, which is similar to a cache provided at the Java system level. The eight basic types of constant pools are system coordinated. String type constant pools are special, and they are mainly used in two ways:
- String objects declared directly in double quotation marks will be directly stored in the constant pool;
- If it is not a String object declared in double quotation marks, you can use the intern method provided by String. The intern method is a Native method. It will query whether the current String exists from the String constant pool. If it does not exist, it will put the current String into the constant pool.
Due to the immutability of String string, there must not be two identical strings in the constant pool.
Memory area
In HotSpot VM, the String constant pool is implemented through a StringTable class, which is a Hash table. The default size and length is 1009; This StringTable has only one copy in each instance of HotSpot VM and is shared by all classes; The String constant consists of one character and is placed on the StringTable. It should be noted that if there are too many strings in the String Pool, the Hash conflict will be serious, resulting in a long linked list. The direct impact of a long linked list is when calling String The performance of intern will be greatly reduced (because it needs to be found one by one). In JDK6 and previous versions, the String constant pool is placed in the Perm Gen area (that is, the method area), and the length of the StringTable is fixed 1009; In the JDK7 version, the String constant pool is moved to the heap, and the length of the StringTable can be specified through the - XX:StringTableSize=66666} parameter. As for why JDK7 moves the constant pool to the heap, the reason may be that the memory space of the method area is too small and inconvenient to expand, while the memory space of the heap is relatively large and convenient to expand.
Memory allocation
In JDK6 and previous versions, string constants are put in String Pool; In JDK7 0, due to string Intern () has changed, so the String Pool can also store references to string objects placed in the heap. Please see the following code:
String s1 = "ABC"; String s2 = "ABC"; String s3 = new String("ABC"); System.out.println(s1 == s2); // true System.out.println(s1 == s3); // false System.out.println(s1.intern() == s3.intern()); // true
Since two identical objects do not exist in the constant pool, s1 and s2 both point to the "ABC" object in the JVM String constant pool. The new keyword must produce an object, which is stored in the heap. So String s3 = new String("ABC"); Two objects are generated: s3 stored in the stack and String object stored in the heap. When String s1 = "ABC" is executed, the JVM will first check whether the "ABC" object exists in the String constant pool. If it does not exist, create the "ABC" object in the String constant pool and return the address of the "ABC" object to s1; If it exists, no object is created, and the address of the "ABC" object in the String constant pool is directly returned to s1. Since the String values of s1, s2 and s3 are the same reference in the constant pool, the return values of the intern() method are equal.
String.intern() method parsing
String.intern() method parsing
Let's take a look at string Code and comments for the intern() method:
/** * Returns a canonical representation for the string object. * <p> * A pool of strings, initially empty, is maintained privately by the * class {@code String}. * <p> * When the intern method is invoked, if the pool already contains a * string equal to this {@code String} object as determined by * the {@link #equals(Object)} method, then the string from the pool is * returned. Otherwise, this {@code String} object is added to the * pool and a reference to this {@code String} object is returned. * <p> * It follows that for any two strings {@code s} and {@code t}, * {@code s.intern() == t.intern()} is {@code true} * if and only if {@code s.equals(t)} is {@code true}. * <p> * All literal strings and string-valued constant expressions are * interned. String literals are defined in section 3.10.5 of the * <cite>The Java™ Language Specification</cite>. * * @return a string that has the same contents as this string, but is * guaranteed to be from a pool of unique strings. */ public native String intern();
The String object declared directly in double quotation marks will be directly stored in the String constant pool. If it is not a String object declared in double quotation marks, you can use the intern method provided by String. The intern method is a native method. The intern method will query whether the current String exists from the String constant pool. If so, it will directly return the current String; If it does not exist, the current String will be put into the constant pool and returned later. JDK1.7. The String constant pool is moved from the Perm area to the Java Heap area When using the intern () method, if there is an object in the heap, the reference of the object will be saved directly without re creating the object.
String. Use of intern()
Let's take a look at the execution process of using and not using intern(). When instantiating a String object with new String("ABC"), if the intern method is used, you will first go to the String constant pool to find out whether there is a String with the value of "ABC". If you find it, you will not create a new "ABC" String. If you can't find it, you will create a new "ABC" String; If the intern method is not used, there is no process of searching the constant pool, and a new "ABC" String will be directly created. It can be seen that the difference between the two is:
- Using intern(), the number of objects actually created is less than the number of objects to be created, because there will be string sharing of constant pool; But correspondingly, the query consumption of the required constant pool will increase the time loss; This reflects a space friendly, do not need too much gc to recycle space;
- If you don't use intern(), you will create as many objects as you actually need. Therefore, a large number of String objects with duplicate values will appear; But correspondingly, the consumption of query will be reduced, and the time consumption will be reduced; This reflects a kind of time friendliness.
String, StringBuffer and StringBuilder
Class diagram
Main differences
- String is an immutable character sequence, and StringBuilder and StringBuffer are variable character sequences;
- StringBuilder is non thread safe and StringBuffer is thread safe. Its thread safety is realized by adding the synchronized keyword to the member method;
- In terms of execution efficiency, StringBuilder > StringBuffer > string
summary
To sum up, let's test the above learning results through another example:
String s1 = "AB"; String s2 = new String("AB"); String s3 = "A"; String s4 = "B"; String s5 = "A" + "B"; String s6 = s3 + s4; System.out.println(s1 == s2); // false System.out.println(s1 == s2.intern()); // true System.out.println(s1 == s5); // true System.out.println(s1 == s6); // false System.out.println(s1 == s6.intern()); // true
The following three points need to be understood:
- String objects declared directly in double quotation marks will be directly stored in the constant pool;
- The intern method of the String object will get the corresponding reference of the String object in the constant pool. If there is no corresponding String in the constant pool, the String will be added to the constant pool, and then the reference of the String in the constant pool will be returned;
- The essence of String + operation is to create a StringBuilder object for append operation, and then process the spliced StringBuilder object into a String object with toString method.
Take a look at the distribution of the above six String objects in memory: