Chapter 13 - StringTable
1. Basic characteristics of string
-
String: string, represented by a pair of ""
String s1 = "baidu"; //Definition of literal quantity String s2 = new String("hello");
-
String is declared as final and cannot be inherited
-
String implements the Serializable interface: it means that the string supports serialization
-
String implements the Comparable interface: it means that strings can compare sizes
-
String defines final char [] value in jdk8 and before to store string data. Changed to byte []
1.1 String storage structure change in jdk9
Official website address: JEP 254: Compact Strings (java.net)
Motivation
The current implementation of the String class stores characters in a char array, using two bytes (sixteen bits) for each character. Data gathered from many different applications indicates that strings are a major component of heap usage and, moreover, that most String objects contain only Latin-1 characters. Such characters require only one byte of storage, hence half of the space in the internal char arrays of such String objects is going unused.
Description
We propose to change the internal representation of the String class from a UTF-16 char array to a byte array plus an encoding-flag field. The new String class will store characters encoded either as ISO-8859-1/Latin-1 (one byte per character), or as UTF-16 (two bytes per character), based upon the contents of the string. The encoding flag will indicate which encoding is used.
String-related classes such as AbstractStringBuilder, StringBuilder, and StringBuffer will be updated to use the same representation, as will the HotSpot VM's intrinsic string operations.
This is purely an implementation change, with no changes to existing public interfaces. There are no plans to add any new public APIs or other interfaces.
The prototyping work done to date confirms the expected reduction in memory footprint, substantial reductions of GC activity, and minor performance regressions in some corner cases.
motivation
The current implementation of this class, string, stores characters in a char array, using two bytes (sixteen bits) for each character. Data collected from many different applications shows that strings are a major part of heap usage, and most string objects contain only Latin-1 characters. Such characters require only one byte of storage space, so half of the space in the internal char array of such objects is not used. String
describe
We recommend changing the internal representation of the String class from a UTF-16char array to a byte array with an encoding flag field. The new String class will store characters encoded as ISO-8859-1/Latin-1 (one byte per character) or UTF-16 (two bytes per character) according to the content of the String. The encoding flag will indicate which encoding is used.
String related classes (such as AbstractStringBuilder, StringBuilder, and StringBuffer) will be updated to use the same representation, as will the intrinsic string operations of HotSpot VM.
This is purely an implementation change, without any changes to the existing public interface. There are no plans to add any new public API s or other interfaces.
The prototype work completed so far confirms the expected reduction in memory usage, a significant reduction in GC activity, and a slight performance regression in some extreme cases.
Conclusion: String is no longer stored in char [] but in byte [] with code mark, which saves some space
public final class String implements java.io.Serializable, Comparable<String>, CharSequence { @Stable private final byte[] value; }
1.2 basic characteristics of string
- String: represents an immutable character sequence. Abbreviation: non variability
- When the string is re assigned, the assigned memory area needs to be rewritten, and the original value cannot be used for assignment
- When connecting an existing string, you also need to reassign the memory area assignment. You cannot use the original value for assignment
- When calling the replace() method of string to modify the specified character or string, you also need to re specify the memory area assignment, and the original value cannot be used for assignment
- Assign a value to a string by literal (different from new), and the string value is declared in the string constant pool
- The string constant pool does not store strings with the same content
- The String Pool of String is a fixed size Hashtable. The default size length is 1009. If there are too many strings in the String Pool, the Hash conflict will be serious, resulting in a long linked list. The direct impact of a long linked list is when calling String The performance will be greatly reduced when intern
- Use - XX:StringTablesize to set the length of a StringTable
- In JDK6, the StringTable is fixed, which is the length of 1009. Therefore, if there are too many strings in the constant pool, the efficiency will decline quickly. StringTablesize setting does not require
- In JDK7, the default value of StringTable length is 60013, and there is no requirement for StringTable size setting
- Starting from JDK8, if the StringTable length is set, 1009 is the minimum value that can be set
Code example: reflect the immutability of String
/** * String Basic use of String: reflects the immutability of String */ public class StringTest1 { @Test public void test1() { String s1 = "abc"; //The literal is defined in such a way that "abc" is stored in the string constant pool String s2 = "abc"; s1 = "hello"; System.out.println(s1 == s2); //Judge address: true -- > false System.out.println(s1); //hello System.out.println(s2); //abc } @Test public void test2() { String s1 = "abc"; String s2 = "abc"; s2 += "def"; System.out.println(s2); //abcdef System.out.println(s1); //abc } @Test public void test3() { String s1 = "abc"; String s2 = s1.replace('a', 'm'); System.out.println(s1); //abc System.out.println(s2); //mbc } }
Starting from JDK8, if the length of StringTable is set, 1009 is the minimum value that can be set
public static void main(String[] args) { //Test the StringTableSize parameter // System.out.println("I'll make soy sauce"); // try { // Thread.sleep(1000000); // } catch (InterruptedException e) { // e.printStackTrace(); // } }
- Nothing is set by default and runs
- Command line input instruction to view the length of StringTable
jps jinfo -flag StringTableSize
- You can see that the default length is 60013
- Setting JVM parameters
-XX:StringTableSize=1000
- The result was wrong~
StringTable size of 1000 is invalid; Must be between 1009 and 2305843009213693951
String written test question: investigate the understanding of the immutability of string
public class StringExer { String str = new String("good"); char[] ch = {'t', 'e', 's', 't'}; public void change(String str, char ch[]) { str = "test ok"; ch[0] = 'b'; } public static void main(String[] args) { StringExer ex = new StringExer(); ex.change(ex.str, ex.ch); System.out.println(ex.str); //good System.out.println(ex.ch); //best } }
Test the impact of StringTable size on performance
- Generate 100000 strings first
/** * Generate 100000 strings with length no more than 10, including A-Z and A-Z */ public class GenerateString { public static void main(String[] args) throws IOException { FileWriter fw = new FileWriter("words.txt"); for (int i = 0; i < 100000; i++) { //1 - 10 int length = (int)(Math.random() * (10 - 1 + 1) + 1); fw.write(getString(length) + "\n"); } fw.close(); } public static String getString(int length){ String str = ""; for (int i = 0; i < length; i++) { //65 - 90, 97-122 int num = (int)(Math.random() * (90 - 65 + 1) + 65) + (int)(Math.random() * 2) * 32; str += (char)num; } return str; } }
- Then store the 100000 strings into the string constant pool to test the impact of different stringtables on performance
/** * -XX:StringTableSize=1009 */ public class StringTest2 { public static void main(String[] args) { BufferedReader br = null; try { br = new BufferedReader(new FileReader("words.txt")); long start = System.currentTimeMillis(); String data; while((data = br.readLine()) != null){ data.intern(); //If there is no string corresponding to data in the string constant pool, it is generated in the constant pool } long end = System.currentTimeMillis(); System.out.println("The time spent is:" + (end - start)); //1009:128ms 10000:51ms } catch (IOException e) { e.printStackTrace(); } finally { if(br != null){ try { br.close(); } catch (IOException e) { e.printStackTrace(); } } } } }
- Set the StringTable size to the default minimum value of 1009
-XX:StringTableSize=1009
- Result: 128ms
Time spent: 128
- Then set the StringTable size to 10000
-XX:StringTableSize=10000
- The result is: 51ms
Time spent: 51
2. Memory allocation of string
- There are eight basic data types and a special type String in the Java language. These types provide a constant pool concept in order to make them run faster and save memory.
- The constant pool is similar to a cache provided at the Java system level. The constant pools of the eight basic data types are system coordinated, and the constant pools of String type are special. There are two main ways to use it.
- String objects declared directly in double quotes are stored directly in the constant pool.
- For example: String info = "baidu.com";
- If it is not a String object declared in double quotation marks, you can use the intern() method provided by String. I'll focus on this later
- String objects declared directly in double quotes are stored directly in the constant pool.
- In Java 6 and before, the string constant pool was stored in the permanent generation
- Oracle engineers in Java 7 have made great changes to the logic of the string pool, that is, the position of the string constant pool is adjusted to the Java heap
- All strings are saved in the Heap, just like other ordinary objects, so that you only need to adjust the Heap size when tuning the application.
- The concept of string constant pool was used a lot, but this change gives us enough reason to reconsider using string in Java 7 intern()
- Java 8 meta space, string constants in heap space
Why should StringTable be adjusted?
- permSize is small by default
- Low frequency of permanent garbage collection
Official website address: Java SE 7 Features and Enhancements (oracle.com)
Synopsis: In JDK 7, interned strings are no longer allocated in the permanent generation of the Java heap, but are instead allocated in the main part of the Java heap (known as the young and old generations), along with the other objects created by the application. This change will result in more data residing in the main Java heap, and less data in the permanent generation, and thus may require heap sizes to be adjusted. Most applications will see only relatively small differences in heap usage due to this change, but larger applications that load many classes or make heavy use of the String.intern() method will see more significant differences.
Introduction: in JDK 7, the internal string is no longer allocated in the permanent generation of the Java heap, but in the main part of the Java heap (called the younger generation and the older generation), together with other objects created by the application. This change will result in more data residing in the main Java heap and less data in the permanent generation, so you may need to resize the heap. Most applications will see a relatively small difference in heap usage due to this change, but load many classes or make heavy use of string Large applications of the intern () method will see more obvious differences.
Code example
/** * jdk6 Medium: * -XX:PermSize=6m -XX:MaxPermSize=6m -Xms6m -Xmx6m * * jdk8 Medium: * -XX:MetaspaceSize=6m -XX:MaxMetaspaceSize=6m -Xms6m -Xmx6m */ public class StringTest3 { public static void main(String[] args) { //Use Set to keep constant pool reference and avoid full gc recycling constant pool behavior Set<String> set = new HashSet<String>(); //Within the range where short can be taken, it is enough for 6MB PermSize or heap to generate OOM. short i = 0; while(true){ set.add(String.valueOf(i++).intern()); } } }
- Setting JVM parameters
-XX:MetaspaceSize=6m -XX:MaxMetaspaceSize=6m -Xms6m -Xmx6m
- You can see that OOM occurs in heap space, so string constant pool does exist in heap space in JDK8
3. Basic operation of string
The Java language specification requires exactly the same String literal, which should contain the same Unicode character sequence (constant containing the same code point sequence), and must point to the same String class instance.
public class StringTest4 { public static void main(String[] args) { System.out.println();//1230 System.out.println("1");//1231 System.out.println("2"); System.out.println("3"); System.out.println("4"); System.out.println("5"); System.out.println("6"); System.out.println("7"); System.out.println("8"); System.out.println("9"); System.out.println("10");//1240 //The following strings "1" to "10" will not be loaded again System.out.println("1");//1241 System.out.println("2");//1241 System.out.println("3"); System.out.println("4"); System.out.println("5"); System.out.println("6"); System.out.println("7"); System.out.println("8"); System.out.println("9"); System.out.println("10");//1241 } }
- Break point operation on some code
- Initialization has 1230 strings
- Execute string "1", and you can find that the number of strings has changed to 1231
- Execute the string "10", and you can find that the number of strings has changed to 1240
- The following same strings have been loaded once in the string constant pool, so the following same strings will not be loaded again
Code example 2
class Memory { public static void main(String[] args) {//line 1 int i = 1;//line 2 Object obj = new Object();//line 3 Memory mem = new Memory();//line 4 mem.foo(obj);//line 5 }//line 9 private void foo(Object param) {//line 6 String str = param.toString();//line 7 System.out.println(str); }//line 8 }
- The local variable table in the above picture is still missing several parameters. The correct number of local variable tables is listed below
4. String splicing
- The splicing results of constants and constants are in the constant pool. The principle is compile time optimization
- Variables with the same content will not exist in the constant pool
- As long as one of them is a variable, the result is in the heap. The principle of variable splicing is StringBuilder
- If the result of splicing calls the intern() method, the string object not yet in the constant pool is actively put into the pool and the object address is returned
Code example 1
@Test public void test1(){ String s1 = "a" + "b" + "c"; //Compile time optimization: equivalent to "abc" String s2 = "abc"; //"abc" must be placed in the string constant pool and assigned this address to s2 /* * final. java compiled into Class, and then execute class * String s1 = "abc"; * String s2 = "abc" */ System.out.println(s1 == s2); //true System.out.println(s1.equals(s2)); //true }
Code example 2
@Test public void test2(){ String s1 = "javaEE"; String s2 = "hadoop"; String s3 = "javaEEhadoop"; String s4 = "javaEE" + "hadoop";//Compile time optimization //If variables appear before and after the splicing symbol, it is equivalent to new String() in the heap space. The specific content is the splicing result: javaEEhadoop String s5 = s1 + "hadoop"; String s6 = "javaEE" + s2; String s7 = s1 + s2; System.out.println(s3 == s4);//true System.out.println(s3 == s5);//false System.out.println(s3 == s6);//false System.out.println(s3 == s7);//false System.out.println(s5 == s6);//false System.out.println(s5 == s7);//false System.out.println(s6 == s7);//false //intern(): judge whether there is a javaEEhadoop value in the string constant pool. If so, return the address of javaEEhadoop in the constant pool; //If javaEEhadoop does not exist in the string constant pool, load a copy of javaEEhadoop in the constant pool and return the address of this object. String s8 = s6.intern(); System.out.println(s3 == s8);//true }
Code example 3
@Test public void test3(){ String s1 = "a"; String s2 = "b"; String s3 = "ab"; /* The execution details of s1 + s2 are as follows: (variable s is temporarily defined by me) ① StringBuilder s = new StringBuilder(); ② s.append("a") ③ s.append("b") ④ s.toString() --> Approximately equal to new String("ab") Add: in jdk5 After 0, StringBuilder is used, which is in jdk5 StringBuffer was used before 0 */ String s4 = s1 + s2;// System.out.println(s3 == s4);//false }
- Decompile bytecode file
- You can see that String s4 = s1 + s2; It is equivalent to a new StringBuilder, then use append to splice s1 and s2 strings, and finally use the toString() method to approximately equal new. A String object is stored in the heap, which should be distinguished from the String constant pool. s3 is stored in the String constant pool and s4 is stored in the heap, so s3 is not equal to s4
Knowledge supplement: after JDK 5, StringBuilder is used, and before JDK 5, StringBuffer is used
String | StringBuffer | StringBuilder |
---|---|---|
The value of String is immutable, which leads to the generation of new String objects every time you operate on String, which is not only inefficient, but also wastes a lot of priority memory space | StringBuffer is a variable class and thread safe string operation class. Any operation on the string it points to will not produce a new object. Each StringBuffer object has a certain buffer capacity. When the string size does not exceed the capacity, no new capacity will be allocated. When the string size exceeds the capacity, the capacity will be automatically increased | Variable classes, faster |
Immutable | variable | variable |
Thread safety | Thread unsafe | |
Multithreaded operation string | Single threaded operation string |
Code example 4
/* 1. String splicing does not necessarily use StringBuilder! If the left and right sides of the splice symbol are string constants or constant references, compile time optimization, that is, non StringBuilder, is still used. 2. When final is used to modify the structure of classes, methods, basic data types and reference data types, it is recommended to use it when final can be used. */ @Test public void test4(){ final String s1 = "a"; final String s2 = "b"; String s3 = "ab"; String s4 = s1 + s2; //s4: constant System.out.println(s3 == s4);//true }
- Note that if we use variables on the left and right sides, we need new StringBuilder to splice them. However, if we use final decoration, we get them from the constant pool. Therefore, if the left and right sides of the splicing symbol are string constants or constant references, compiler optimization is still used. In other words, variables modified by final will become constants, and classes and methods will not be inherited.
- When final can be used in development, it is recommended to use it
Code example 5
/* Experience the execution efficiency: adding strings through StringBuilder's append() method is much more efficient than using String splicing method! Details: ① method of StringBuilder's append(): only one StringBuilder object has been created from beginning to end String splicing method using string: create too many StringBuilder and string objects ② String splicing method using string: because more StringBuilder and string objects are created in memory, memory consumption is greater; If GC is performed, it will take additional time. Room for improvement: in actual development, if it is basically determined that the length of the string to be added before and after is not higher than a certain limit value highLevel, it is recommended to use constructor instantiation: StringBuilder s = new StringBuilder(highLevel);//new char[highLevel] */ @Test public void test6(){ long start = System.currentTimeMillis(); // method1(100000);//5046 method2(100000);//6 long end = System.currentTimeMillis(); System.out.println("The time spent is:" + (end - start)); } public void method1(int highLevel){ String src = ""; for(int i = 0;i < highLevel;i++){ src = src + "a";//A StringBuilder and String are created for each cycle } // System.out.println(src); } public void method2(int highLevel){ //Just create a StringBuilder StringBuilder src = new StringBuilder(); for (int i = 0; i < highLevel; i++) { src.append("a"); } // System.out.println(src); }
5. Use of intern()
Interpretation in official API documentation
public String intern( )
Returns a canonical representation for the string object.
A pool of strings, initially empty, is maintained privately by the class String.
When the intern method is invoked, if the pool already contains a string equal to this String object as determined by the [equals(Object)](https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#equals-java.lang.Object-) method, then the string from the pool is returned. Otherwise, this String object is added to the pool and a reference to this String object is returned.
It follows that for any two strings s and t, s.intern() == t.intern() is true if and only if s.equals(t) is true.
All literal strings and string-valued constant expressions are interned. String literals are defined in section 3.10.5 of the The Java™ Language Specification.
- **Returns:**a string that has the same contents as this string, but is guaranteed to be from a pool of unique strings.
When calling the intern method, if the pool already contains a String equal to the String object, as determined by the equals(Object) method, the String in the pool will be returned. Otherwise, the String object is added to the pool and a reference to the String object is returned.
Thus, for any two strings S and T, s.intern() = = t.intern() is true if and only if s.equals(t) is true.
All literal strings and constant expressions with string values are interconnected.
Returns a string with the same content as this string, but guaranteed to come from a unique string pool.
- intern() is a native method that calls the methods of the underlying C.
public native String intern();
- If it is not a String object declared in double quotation marks, you can use the intern method provided by String. It will query whether the current String exists from the String constant pool. If it does not exist, it will put the current String into the constant pool.
String myInfo = new string("I love alibaba").intern();
- That is, if you call string. On any string For the intern method, the class instance to which the returned result points must be exactly the same as the string instance directly in the form of a constant. Therefore, the value of the following expression must be true
("a"+"b"+"c").intern() == "abc"
- Generally speaking, interconnected string is to ensure that there is only one copy of the string in memory, which can save memory space and speed up the execution of string operation tasks. Note that this value will be stored in the string inter pool
/** * How to ensure that the variable s points to the data in the string constant pool? * There are two ways: * Method 1: String s = "shkstart"// How literal quantities are defined * Method 2: call intern() * String s = new String("shkstart").intern(); * String s = new StringBuilder("shkstart").toString().intern(); */
5.1 interview questions
new String("ab") creates several objects
/** * new String("ab") How many objects will be created? * Look at the bytecode and you can see that there are two objects */ public class StringNewTest { public static void main(String[] args) { String str = new String("ab"); } }
- We convert it to bytecode to see
- There are two objects
- An object is: the new keyword is created in heap space
- Another object: object * * "ab" in string constant pool**
new String("a") + new String("b") will create several objects
/** * new String("a") + new String("b") How many objects will be created? */ public class StringNewTest { public static void main(String[] args) { String str = new String("a") + new String("b"); } }
- We convert it into bytecode to see
0 new #2 <java/lang/StringBuilder> //new StringBuilder() 3 dup 4 invokespecial #3 <java/lang/StringBuilder.<init> : ()V> 7 new #4 <java/lang/String> //new String() 10 dup 11 ldc #5 < a > / / A in constant pool 13 invokespecial #6 <java/lang/String.<init> : (Ljava/lang/String;)V> //new String("a") 16 invokevirtual #7 <java/lang/StringBuilder.append : (Ljava/lang/String;)Ljava/lang/StringBuilder;> //append() 19 new #4 <java/lang/String> //new String() 22 dup 23 ldc #8 < b > / / constant pool "B" 25 invokespecial #6 <java/lang/String.<init> : (Ljava/lang/String;)V> //new String("b") 28 invokevirtual #7 <java/lang/StringBuilder.append : (Ljava/lang/String;)Ljava/lang/StringBuilder;> //append() 31 invokevirtual #9 <java/lang/StringBuilder. toString : ()Ljava/lang/String;> // A new string object will be in tostring() 34 astore_1 35 return
- We created six objects
- Object 1: new StringBuilder()
- Object 2: new String("a")
- Object 3: "a" in constant pool
- Object 4: new String("b")
- Object 5: "b" in constant pool
- Object 6: toString will create a new String("ab")
- The call to string() does not generate "ab" in the string constant pool
The call to string() does not generate "ab" in the string constant pool
- toString() source code in StringBuilder
@Override public String toString() { // Create a copy, don't share the array return new String(value, 0, count); }
- We convert it to bytecode to see
- You can see that toString() contains only a new String object, which is not stored in the String constant pool
5.2 use of Intern: JDK6 vs JDK7/8
public class StringIntern { public static void main(String[] args) { /** * ① String s = new String("1") * Two objects were created * A new object in heap space * There is a string constant "1" in the string constant pool (Note: there is already "1" in the string constant pool at this time) * ② s.intern()Because '1' already exists in the string constant pool * * s Points to the address of the object in the heap space * s2 Refers to the address of "1" in the constant pool in heap space * So it's not equal */ String s = new String("1"); s.intern();//'1' already exists in the string constant pool before calling this method String s2 = "1"; System.out.println(s == s2);//jdk6: false jdk7/8: false /** * ① String s3 = new String("1") + new String("1") * It is equivalent to new String ("11"), but the string "11" is not generated in the constant pool; * * ② s3.intern() * Since there is no "11" in the constant pool at this time, the address of the object recorded in s3 is stored in the constant pool * So s3 and s4 point to the same address */ String s3 = new String("1") + new String("1");//The address of s3 variable record is: new String("11") //After executing the previous line of code, does "11" exist in the string constant pool? Answer: does not exist!! s3.intern();//Generate "11" in the string constant pool. How to understand: jdk6: if a new object "11" is really created in the constant pool, there will be a new address. // jdk7: at this time, "11" is not really created in the constant pool, but an address pointing to the new String("11") in the heap space is created String s4 = "11";//Address of s4 variable record: the address of "11" generated in the constant pool during the execution of the previous line of code is used System.out.println(s3 == s4);//jdk6: false jdk7/8: true } }
In JDK 6
In JDK 7
Expansion: jdk8 environment
public class StringIntern1 { public static void main(String[] args) { //StringIntern. Expansion of exercises in Java: String s3 = new String("1") + new String("1");//new String("11") //After executing the previous line of code, does "11" exist in the string constant pool? Answer: does not exist!! String s4 = "11";//Generate object "11" in string constant pool String s5 = s3.intern(); System.out.println(s3 == s4);//false System.out.println(s5 == s4);//true } }
Summarize the use of String intern():
- JDK1.6, try to put this string object into the string constant pool.
- If there is in the string constant pool, it will not be put in. Returns the address of an object in an existing string constant pool
- If not, a copy of this object will be copied into the string constant pool, and the object address in the string constant pool will be returned
- JDK1.7, try to put this string object into the string constant pool.
- If there is in the string constant pool, it will not be put in. Returns the address of an object in an existing string constant pool
- If not, the reference address of the object will be copied and put into the string constant pool, and the reference address in the string constant pool will be returned
5.2.1 exercise (further understanding of different versions of JDK intern)
Exercise 1
public class StringExer1 { public static void main(String[] args) { String s = new String("a") + new String("b");//new String("ab") //After the execution of the previous line of code, there is no "ab" in the string constant pool String s2 = s.intern();//jdk6: create a string "ab" in the string constant pool and return the "ab" address in the string constant pool to s2 //jdk8: instead of creating the string "ab" in the string constant pool, create a reference to the new String("ab") and return this reference to s2 System.out.println(s2 == "ab");//jdk6:true jdk8:true System.out.println(s == "ab");//jdk6:false jdk8:true } }
Exercise 2
Exercise 3: jdk8 environment
public class StringExer2 { public static void main(String[] args) { String s1 = new String("a") + new String("b"); //After execution, "ab" will not be generated in the string constant pool s1.intern(); //At this time, the string constant pool stores references to objects in heap space String s2 = "ab"; //Points to the reference address in the string constant pool System.out.println(s1 == s2); //true } }
public class StringExer2 { public static void main(String[] args) { String s1 = new String("ab");//After execution, "ab" will be generated in the string constant pool s1.intern(); //At this time, the object address of the string constant generated by the previous line of code is stored in the string constant pool String s2 = "ab"; //Points to the address of the object in the string constant pool System.out.println(s1 == s2); //false } }
5.3 intern efficiency test: space angle
/** * Test execution efficiency with intern(): space usage */ public class StringIntern2 { static final int MAX_COUNT = 1000 * 10000; static final String[] arr = new String[MAX_COUNT]; public static void main(String[] args) { Integer[] data = new Integer[]{1,2,3,4,5,6,7,8,9,10}; long start = System.currentTimeMillis(); for (int i = 0; i < MAX_COUNT; i++) { // arr[i] = new String(String.valueOf(data[i % data.length])); arr[i] = new String(String.valueOf(data[i % data.length])).intern(); } long end = System.currentTimeMillis(); System.out.println("The time spent is:" + (end - start)); try { Thread.sleep(1000000); } catch (InterruptedException e) { e.printStackTrace(); } System.gc(); } }
- Operation results
Not used intern: 7215ms use intern: 1542ms
- Without using intern, more than 10 million String instance objects are generated
- When using intern, only more than 2 million String instance objects are generated
conclusion
- When a large number of existing strings are used in the program, especially when there are many repeated strings, using the intern() method can save memory space.
- Large website platforms need to store a large number of strings in memory. For example, social networking sites, many people store information such as Beijing and Haidian District. At this time, if all strings call the intern() method, the memory size will be significantly reduced.
6. Garbage collection of stringtable
/** * String Garbage collection for: * -Xms15m -Xmx15m -XX:+PrintStringTableStatistics -XX:+PrintGCDetails */ public class StringGCTest { public static void main(String[] args) { for (int j = 0; j < 100000; j++) { String.valueOf(j).intern(); } } }
7. String de duplication in G1
Official website address: JEP 192: String Deduplication in G1 (java.net)
Motivation
Many large-scale Java applications are currently bottlenecked on memory. Measurements have shown that roughly 25% of the Java heap live data set in these types of applications is consumed by String objects. Further, roughly half of those String objects are duplicates, where duplicates means string1.equals(string2) is true. Having duplicate String objects on the heap is, essentially, just a waste of memory. This project will implement automatic and continuous String deduplication in the G1 garbage collector to avoid wasting memory and reduce the memory footprint.
At present, many large-scale Java applications have encountered memory bottlenecks. Measurements show that in these types of applications, about 25% of Java heap real-time data sets are consumed by String 'objects. In addition, about half of these "String" objects are repeated, where repetition means that "string1.equals(string2)" is true. Having duplicate String 'objects on the heap is essentially a waste of memory. This project will implement automatic and continuous' String 'data deduplication in G1 garbage collector to avoid wasting memory and reduce memory occupation.
Note that the repetition here refers to the data in the heap, not in the constant pool, because the data in the constant pool itself will not be repeated
Background: tests on many Java applications (large and small) have yielded the following results:
- string objects account for 25% of the heap data set
- There are 13.5% duplicate string objects in the heap data set
- The average length of a string object is 45
The bottleneck of many large-scale Java applications lies in memory. Tests show that in these types of applications, almost 25% of the data sets surviving in the Java heap are string objects. Furthermore, almost half of the string objects are repeated, which means: string1 equals(string2) == true. The existence of duplicate string objects on the heap must be a waste of memory. This project will implement automatic and continuous de duplication of duplicate string objects in G1 garbage collector, so as to avoid wasting memory.
realization
- When the garbage collector works, it accesses the objects that live on the heap. For each accessed object, it will check whether it is a candidate String object to be de duplicated
- If so, insert a reference of this object into the queue and wait for subsequent processing. A de duplication thread runs in the background to process the queue. Processing an element of a queue means deleting the element from the queue and then trying to recreate the string object it references.
- Use a hashtable to record all non repeating char arrays used by String objects. When you go as like as two peas, you will look at this hashtable to see if there is a char array on the heap.
- If it exists, the String object will be adjusted to refer to that array, release the reference to the original array, and finally be recycled by the garbage collector.
- If the search fails, the char array will be inserted into the hashtable so that the array can be shared later.
Command line options
# Enable String de duplication. It is not enabled by default and needs to be enabled manually. UseStringDeduplication(bool) # Print detailed de duplication statistics PrintStringDeduplicationStatistics(bool) # String objects that reach this age are considered candidates for de duplication StringpeDuplicationAgeThreshold(uintx)