Chapter 13 - StringTable

Posted by bubblegum.anarchy on Mon, 24 Jan 2022 15:28:29 +0100

Chapter 13 - StringTable

1. Basic characteristics of string

  • String: string, represented by a pair of ""

    String s1 = "baidu"; //Definition of literal quantity
    String s2 = new String("hello");
    
  • String is declared as final and cannot be inherited

  • String implements the Serializable interface: it means that the string supports serialization

  • String implements the Comparable interface: it means that strings can compare sizes

  • String defines final char [] value in jdk8 and before to store string data. Changed to byte []

1.1 String storage structure change in jdk9

Official website address: JEP 254: Compact Strings (java.net)

Motivation

The current implementation of the String class stores characters in a char array, using two bytes (sixteen bits) for each character. Data gathered from many different applications indicates that strings are a major component of heap usage and, moreover, that most String objects contain only Latin-1 characters. Such characters require only one byte of storage, hence half of the space in the internal char arrays of such String objects is going unused.

Description

We propose to change the internal representation of the String class from a UTF-16 char array to a byte array plus an encoding-flag field. The new String class will store characters encoded either as ISO-8859-1/Latin-1 (one byte per character), or as UTF-16 (two bytes per character), based upon the contents of the string. The encoding flag will indicate which encoding is used.

String-related classes such as AbstractStringBuilder, StringBuilder, and StringBuffer will be updated to use the same representation, as will the HotSpot VM's intrinsic string operations.

This is purely an implementation change, with no changes to existing public interfaces. There are no plans to add any new public APIs or other interfaces.

The prototyping work done to date confirms the expected reduction in memory footprint, substantial reductions of GC activity, and minor performance regressions in some corner cases.

motivation

The current implementation of this class, string, stores characters in a char array, using two bytes (sixteen bits) for each character. Data collected from many different applications shows that strings are a major part of heap usage, and most string objects contain only Latin-1 characters. Such characters require only one byte of storage space, so half of the space in the internal char array of such objects is not used. String

describe

We recommend changing the internal representation of the String class from a UTF-16char array to a byte array with an encoding flag field. The new String class will store characters encoded as ISO-8859-1/Latin-1 (one byte per character) or UTF-16 (two bytes per character) according to the content of the String. The encoding flag will indicate which encoding is used.

String related classes (such as AbstractStringBuilder, StringBuilder, and StringBuffer) will be updated to use the same representation, as will the intrinsic string operations of HotSpot VM.

This is purely an implementation change, without any changes to the existing public interface. There are no plans to add any new public API s or other interfaces.

The prototype work completed so far confirms the expected reduction in memory usage, a significant reduction in GC activity, and a slight performance regression in some extreme cases.

Conclusion: String is no longer stored in char [] but in byte [] with code mark, which saves some space

public final class String implements java.io.Serializable, Comparable<String>, CharSequence {
    @Stable
    private final byte[] value;
}

1.2 basic characteristics of string

  • String: represents an immutable character sequence. Abbreviation: non variability
    • When the string is re assigned, the assigned memory area needs to be rewritten, and the original value cannot be used for assignment
    • When connecting an existing string, you also need to reassign the memory area assignment. You cannot use the original value for assignment
    • When calling the replace() method of string to modify the specified character or string, you also need to re specify the memory area assignment, and the original value cannot be used for assignment
  • Assign a value to a string by literal (different from new), and the string value is declared in the string constant pool
  • The string constant pool does not store strings with the same content
    • The String Pool of String is a fixed size Hashtable. The default size length is 1009. If there are too many strings in the String Pool, the Hash conflict will be serious, resulting in a long linked list. The direct impact of a long linked list is when calling String The performance will be greatly reduced when intern
    • Use - XX:StringTablesize to set the length of a StringTable
    • In JDK6, the StringTable is fixed, which is the length of 1009. Therefore, if there are too many strings in the constant pool, the efficiency will decline quickly. StringTablesize setting does not require
    • In JDK7, the default value of StringTable length is 60013, and there is no requirement for StringTable size setting
    • Starting from JDK8, if the StringTable length is set, 1009 is the minimum value that can be set

Code example: reflect the immutability of String

/**
 * String Basic use of String: reflects the immutability of String
 */
public class StringTest1 {

    @Test
    public void test1() {
        String s1 = "abc"; //The literal is defined in such a way that "abc" is stored in the string constant pool
        String s2 = "abc";
        s1 = "hello";

        System.out.println(s1 == s2); //Judge address: true -- > false

        System.out.println(s1); //hello
        System.out.println(s2); //abc
    }

    @Test
    public void test2() {
        String s1 = "abc";
        String s2 = "abc";
        s2 += "def";
        System.out.println(s2); //abcdef
        System.out.println(s1); //abc
    }

    @Test
    public void test3() {
        String s1 = "abc";
        String s2 = s1.replace('a', 'm');
        System.out.println(s1); //abc
        System.out.println(s2); //mbc
    }

}

Starting from JDK8, if the length of StringTable is set, 1009 is the minimum value that can be set

public static void main(String[] args) {
        //Test the StringTableSize parameter
//        System.out.println("I'll make soy sauce");
//        try {
//            Thread.sleep(1000000);
//        } catch (InterruptedException e) {
//            e.printStackTrace();
//        }
}
  • Nothing is set by default and runs
  • Command line input instruction to view the length of StringTable
jps
jinfo -flag StringTableSize

  • You can see that the default length is 60013
  • Setting JVM parameters
-XX:StringTableSize=1000
  • The result was wrong~

StringTable size of 1000 is invalid; Must be between 1009 and 2305843009213693951

String written test question: investigate the understanding of the immutability of string

public class StringExer {

    String str = new String("good");
    char[] ch = {'t', 'e', 's', 't'};

    public void change(String str, char ch[]) {
        str = "test ok";
        ch[0] = 'b';
    }

    public static void main(String[] args) {
        StringExer ex = new StringExer();
        ex.change(ex.str, ex.ch);
        System.out.println(ex.str); //good
        System.out.println(ex.ch); //best
    }

}

Test the impact of StringTable size on performance

  • Generate 100000 strings first
/**
 * Generate 100000 strings with length no more than 10, including A-Z and A-Z
 */
public class GenerateString {

    public static void main(String[] args) throws IOException {
        FileWriter fw =  new FileWriter("words.txt");

        for (int i = 0; i < 100000; i++) {
            //1 - 10
           int length = (int)(Math.random() * (10 - 1 + 1) + 1);
            fw.write(getString(length) + "\n");
        }

        fw.close();
    }

    public static String getString(int length){
        String str = "";
        for (int i = 0; i < length; i++) {
            //65 - 90, 97-122
            int num = (int)(Math.random() * (90 - 65 + 1) + 65) + (int)(Math.random() * 2) * 32;
            str += (char)num;
        }
        return str;
    }

}
  • Then store the 100000 strings into the string constant pool to test the impact of different stringtables on performance
/**
 *  -XX:StringTableSize=1009
 */
public class StringTest2 {

    public static void main(String[] args) {
        BufferedReader br = null;
        try {
            br = new BufferedReader(new FileReader("words.txt"));
            long start = System.currentTimeMillis();
            String data;
            while((data = br.readLine()) != null){
                data.intern(); //If there is no string corresponding to data in the string constant pool, it is generated in the constant pool
            }

            long end = System.currentTimeMillis();

            System.out.println("The time spent is:" + (end - start)); //1009:128ms  10000:51ms
        } catch (IOException e) {
            e.printStackTrace();
        } finally {
            if(br != null){
                try {
                    br.close();
                } catch (IOException e) {
                    e.printStackTrace();
                }

            }
        }
    }

}
  • Set the StringTable size to the default minimum value of 1009
-XX:StringTableSize=1009
  • Result: 128ms
Time spent: 128
  • Then set the StringTable size to 10000
-XX:StringTableSize=10000
  • The result is: 51ms
Time spent: 51

2. Memory allocation of string

  • There are eight basic data types and a special type String in the Java language. These types provide a constant pool concept in order to make them run faster and save memory.
  • The constant pool is similar to a cache provided at the Java system level. The constant pools of the eight basic data types are system coordinated, and the constant pools of String type are special. There are two main ways to use it.
    • String objects declared directly in double quotes are stored directly in the constant pool.
      • For example: String info = "baidu.com";
    • If it is not a String object declared in double quotation marks, you can use the intern() method provided by String. I'll focus on this later
  • In Java 6 and before, the string constant pool was stored in the permanent generation
  • Oracle engineers in Java 7 have made great changes to the logic of the string pool, that is, the position of the string constant pool is adjusted to the Java heap
    • All strings are saved in the Heap, just like other ordinary objects, so that you only need to adjust the Heap size when tuning the application.
    • The concept of string constant pool was used a lot, but this change gives us enough reason to reconsider using string in Java 7 intern()
  • Java 8 meta space, string constants in heap space

Why should StringTable be adjusted?

  • permSize is small by default
  • Low frequency of permanent garbage collection

Official website address: Java SE 7 Features and Enhancements (oracle.com)

Synopsis: In JDK 7, interned strings are no longer allocated in the permanent generation of the Java heap, but are instead allocated in the main part of the Java heap (known as the young and old generations), along with the other objects created by the application. This change will result in more data residing in the main Java heap, and less data in the permanent generation, and thus may require heap sizes to be adjusted. Most applications will see only relatively small differences in heap usage due to this change, but larger applications that load many classes or make heavy use of the String.intern() method will see more significant differences.

Introduction: in JDK 7, the internal string is no longer allocated in the permanent generation of the Java heap, but in the main part of the Java heap (called the younger generation and the older generation), together with other objects created by the application. This change will result in more data residing in the main Java heap and less data in the permanent generation, so you may need to resize the heap. Most applications will see a relatively small difference in heap usage due to this change, but load many classes or make heavy use of string Large applications of the intern () method will see more obvious differences.

Code example

/**
 * jdk6 Medium:
 * -XX:PermSize=6m -XX:MaxPermSize=6m -Xms6m -Xmx6m
 *
 * jdk8 Medium:
 * -XX:MetaspaceSize=6m -XX:MaxMetaspaceSize=6m -Xms6m -Xmx6m
 */
public class StringTest3 {

    public static void main(String[] args) {
        //Use Set to keep constant pool reference and avoid full gc recycling constant pool behavior
        Set<String> set = new HashSet<String>();
        //Within the range where short can be taken, it is enough for 6MB PermSize or heap to generate OOM.
        short i = 0;
        while(true){
            set.add(String.valueOf(i++).intern());
        }
    }

}
  • Setting JVM parameters
-XX:MetaspaceSize=6m -XX:MaxMetaspaceSize=6m -Xms6m -Xmx6m
  • You can see that OOM occurs in heap space, so string constant pool does exist in heap space in JDK8

3. Basic operation of string

The Java language specification requires exactly the same String literal, which should contain the same Unicode character sequence (constant containing the same code point sequence), and must point to the same String class instance.

public class StringTest4 {

    public static void main(String[] args) {
        System.out.println();//1230
        System.out.println("1");//1231
        System.out.println("2");
        System.out.println("3");
        System.out.println("4");
        System.out.println("5");
        System.out.println("6");
        System.out.println("7");
        System.out.println("8");
        System.out.println("9");
        System.out.println("10");//1240
        //The following strings "1" to "10" will not be loaded again
        System.out.println("1");//1241
        System.out.println("2");//1241
        System.out.println("3");
        System.out.println("4");
        System.out.println("5");
        System.out.println("6");
        System.out.println("7");
        System.out.println("8");
        System.out.println("9");
        System.out.println("10");//1241
    }

}
  • Break point operation on some code
  • Initialization has 1230 strings

  • Execute string "1", and you can find that the number of strings has changed to 1231

  • Execute the string "10", and you can find that the number of strings has changed to 1240

  • The following same strings have been loaded once in the string constant pool, so the following same strings will not be loaded again

Code example 2

class Memory {

    public static void main(String[] args) {//line 1
        int i = 1;//line 2
        Object obj = new Object();//line 3
        Memory mem = new Memory();//line 4
        mem.foo(obj);//line 5
    }//line 9

    private void foo(Object param) {//line 6
        String str = param.toString();//line 7
        System.out.println(str);
    }//line 8

}

  • The local variable table in the above picture is still missing several parameters. The correct number of local variable tables is listed below

4. String splicing

  • The splicing results of constants and constants are in the constant pool. The principle is compile time optimization
  • Variables with the same content will not exist in the constant pool
  • As long as one of them is a variable, the result is in the heap. The principle of variable splicing is StringBuilder
  • If the result of splicing calls the intern() method, the string object not yet in the constant pool is actively put into the pool and the object address is returned

Code example 1

		@Test
    public void test1(){
        String s1 = "a" + "b" + "c"; //Compile time optimization: equivalent to "abc"
        String s2 = "abc"; //"abc" must be placed in the string constant pool and assigned this address to s2
        /*
         * final. java compiled into Class, and then execute class
         * String s1 = "abc";
         * String s2 = "abc"
         */
        System.out.println(s1 == s2); //true
        System.out.println(s1.equals(s2)); //true
    }

Code example 2

		@Test
    public void test2(){
        String s1 = "javaEE";
        String s2 = "hadoop";

        String s3 = "javaEEhadoop";
        String s4 = "javaEE" + "hadoop";//Compile time optimization
        //If variables appear before and after the splicing symbol, it is equivalent to new String() in the heap space. The specific content is the splicing result: javaEEhadoop
        String s5 = s1 + "hadoop";
        String s6 = "javaEE" + s2;
        String s7 = s1 + s2;

        System.out.println(s3 == s4);//true
        System.out.println(s3 == s5);//false
        System.out.println(s3 == s6);//false
        System.out.println(s3 == s7);//false
        System.out.println(s5 == s6);//false
        System.out.println(s5 == s7);//false
        System.out.println(s6 == s7);//false
        //intern(): judge whether there is a javaEEhadoop value in the string constant pool. If so, return the address of javaEEhadoop in the constant pool;
        //If javaEEhadoop does not exist in the string constant pool, load a copy of javaEEhadoop in the constant pool and return the address of this object.
        String s8 = s6.intern();
        System.out.println(s3 == s8);//true
    }

Code example 3

		@Test
    public void test3(){
        String s1 = "a";
        String s2 = "b";
        String s3 = "ab";
        /*
        The execution details of s1 + s2 are as follows: (variable s is temporarily defined by me)
        ① StringBuilder s = new StringBuilder();
        ② s.append("a")
        ③ s.append("b")
        ④ s.toString()  --> Approximately equal to new String("ab")

        Add: in jdk5 After 0, StringBuilder is used, which is in jdk5 StringBuffer was used before 0
         */
        String s4 = s1 + s2;//
        System.out.println(s3 == s4);//false
    }
  • Decompile bytecode file

  • You can see that String s4 = s1 + s2; It is equivalent to a new StringBuilder, then use append to splice s1 and s2 strings, and finally use the toString() method to approximately equal new. A String object is stored in the heap, which should be distinguished from the String constant pool. s3 is stored in the String constant pool and s4 is stored in the heap, so s3 is not equal to s4

Knowledge supplement: after JDK 5, StringBuilder is used, and before JDK 5, StringBuffer is used

StringStringBufferStringBuilder
The value of String is immutable, which leads to the generation of new String objects every time you operate on String, which is not only inefficient, but also wastes a lot of priority memory spaceStringBuffer is a variable class and thread safe string operation class. Any operation on the string it points to will not produce a new object. Each StringBuffer object has a certain buffer capacity. When the string size does not exceed the capacity, no new capacity will be allocated. When the string size exceeds the capacity, the capacity will be automatically increasedVariable classes, faster
Immutablevariablevariable
Thread safetyThread unsafe
Multithreaded operation stringSingle threaded operation string

Code example 4

		/*
    1. String splicing does not necessarily use StringBuilder!
       If the left and right sides of the splice symbol are string constants or constant references, compile time optimization, that is, non StringBuilder, is still used.
    2. When final is used to modify the structure of classes, methods, basic data types and reference data types, it is recommended to use it when final can be used.
     */
    @Test
    public void test4(){
        final String s1 = "a";
        final String s2 = "b";
        String s3 = "ab";
        String s4 = s1 + s2; //s4: constant
        System.out.println(s3 == s4);//true
    }
  • Note that if we use variables on the left and right sides, we need new StringBuilder to splice them. However, if we use final decoration, we get them from the constant pool. Therefore, if the left and right sides of the splicing symbol are string constants or constant references, compiler optimization is still used. In other words, variables modified by final will become constants, and classes and methods will not be inherited.
    • When final can be used in development, it is recommended to use it

Code example 5

		/*
    Experience the execution efficiency: adding strings through StringBuilder's append() method is much more efficient than using String splicing method!
    Details: ① method of StringBuilder's append(): only one StringBuilder object has been created from beginning to end
          String splicing method using string: create too many StringBuilder and string objects
         ② String splicing method using string: because more StringBuilder and string objects are created in memory, memory consumption is greater; If GC is performed, it will take additional time.

     Room for improvement: in actual development, if it is basically determined that the length of the string to be added before and after is not higher than a certain limit value highLevel, it is recommended to use constructor instantiation:
               StringBuilder s = new StringBuilder(highLevel);//new char[highLevel]
     */
    @Test
    public void test6(){

        long start = System.currentTimeMillis();

//        method1(100000);//5046
        method2(100000);//6

        long end = System.currentTimeMillis();

        System.out.println("The time spent is:" + (end - start));
    }

    public void method1(int highLevel){
        String src = "";
        for(int i = 0;i < highLevel;i++){
            src = src + "a";//A StringBuilder and String are created for each cycle
        }
//        System.out.println(src);
    }

    public void method2(int highLevel){
        //Just create a StringBuilder
        StringBuilder src = new StringBuilder();
        for (int i = 0; i < highLevel; i++) {
            src.append("a");
        }
//        System.out.println(src);
    }

5. Use of intern()

Interpretation in official API documentation

public String intern( )

Returns a canonical representation for the string object.

A pool of strings, initially empty, is maintained privately by the class String.

When the intern method is invoked, if the pool already contains a string equal to this String object as determined by the [equals(Object)](https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#equals-java.lang.Object-) method, then the string from the pool is returned. Otherwise, this String object is added to the pool and a reference to this String object is returned.

It follows that for any two strings s and t, s.intern() == t.intern() is true if and only if s.equals(t) is true.

All literal strings and string-valued constant expressions are interned. String literals are defined in section 3.10.5 of the The Java™ Language Specification.

  • **Returns:**a string that has the same contents as this string, but is guaranteed to be from a pool of unique strings.

When calling the intern method, if the pool already contains a String equal to the String object, as determined by the equals(Object) method, the String in the pool will be returned. Otherwise, the String object is added to the pool and a reference to the String object is returned.

Thus, for any two strings S and T, s.intern() = = t.intern() is true if and only if s.equals(t) is true.

All literal strings and constant expressions with string values are interconnected.

Returns a string with the same content as this string, but guaranteed to come from a unique string pool.

  • intern() is a native method that calls the methods of the underlying C.
public native String intern();
  • If it is not a String object declared in double quotation marks, you can use the intern method provided by String. It will query whether the current String exists from the String constant pool. If it does not exist, it will put the current String into the constant pool.
String myInfo = new string("I love alibaba").intern();
  • That is, if you call string. On any string For the intern method, the class instance to which the returned result points must be exactly the same as the string instance directly in the form of a constant. Therefore, the value of the following expression must be true
("a"+"b"+"c").intern() == "abc"
  • Generally speaking, interconnected string is to ensure that there is only one copy of the string in memory, which can save memory space and speed up the execution of string operation tasks. Note that this value will be stored in the string inter pool
/**
 * How to ensure that the variable s points to the data in the string constant pool?
 * There are two ways:
 * Method 1: String s = "shkstart"// How literal quantities are defined
 * Method 2: call intern()
 *         String s = new String("shkstart").intern();
 *         String s = new StringBuilder("shkstart").toString().intern();
 */

5.1 interview questions

new String("ab") creates several objects

/**
 * new String("ab") How many objects will be created? 
 * Look at the bytecode and you can see that there are two objects
 */
public class StringNewTest {
    public static void main(String[] args) {
        String str = new String("ab");
    }
}
  • We convert it to bytecode to see

  • There are two objects
    • An object is: the new keyword is created in heap space
    • Another object: object * * "ab" in string constant pool**

new String("a") + new String("b") will create several objects

/**
 * new String("a") + new String("b") How many objects will be created? 
 */
public class StringNewTest {
    public static void main(String[] args) {
        String str = new String("a") + new String("b");
    }
}
  • We convert it into bytecode to see
 0 new #2 <java/lang/StringBuilder> //new StringBuilder()
 3 dup
 4 invokespecial #3 <java/lang/StringBuilder.<init> : ()V>
 7 new #4 <java/lang/String> //new String()
10 dup
11 ldc #5 < a > / / A in constant pool
13 invokespecial #6 <java/lang/String.<init> : (Ljava/lang/String;)V> //new String("a")
16 invokevirtual #7 <java/lang/StringBuilder.append : (Ljava/lang/String;)Ljava/lang/StringBuilder;> //append()
19 new #4 <java/lang/String> //new String()
22 dup
23 ldc #8 < b > / / constant pool "B"
25 invokespecial #6 <java/lang/String.<init> : (Ljava/lang/String;)V> //new String("b")
28 invokevirtual #7 <java/lang/StringBuilder.append : (Ljava/lang/String;)Ljava/lang/StringBuilder;> //append()
31 invokevirtual #9 <java/lang/StringBuilder. toString : ()Ljava/lang/String;> // A new string object will be in tostring()
34 astore_1
35 return
  • We created six objects
    • Object 1: new StringBuilder()
    • Object 2: new String("a")
    • Object 3: "a" in constant pool
    • Object 4: new String("b")
    • Object 5: "b" in constant pool
    • Object 6: toString will create a new String("ab")
      • The call to string() does not generate "ab" in the string constant pool

The call to string() does not generate "ab" in the string constant pool

  • toString() source code in StringBuilder
		@Override
    public String toString() {
        // Create a copy, don't share the array
        return new String(value, 0, count);
    }
  • We convert it to bytecode to see

  • You can see that toString() contains only a new String object, which is not stored in the String constant pool

5.2 use of Intern: JDK6 vs JDK7/8

public class StringIntern {

    public static void main(String[] args) {

        /**
         * ① String s = new String("1")
         * Two objects were created
         * 		A new object in heap space
         * 		There is a string constant "1" in the string constant pool (Note: there is already "1" in the string constant pool at this time)
         * ② s.intern()Because '1' already exists in the string constant pool
         *
         * s  Points to the address of the object in the heap space
         * s2 Refers to the address of "1" in the constant pool in heap space
         * So it's not equal
         */
        String s = new String("1");
        s.intern();//'1' already exists in the string constant pool before calling this method
        String s2 = "1";
        System.out.println(s == s2);//jdk6: false   jdk7/8: false

        /**
         * ① String s3 = new String("1") + new String("1")
         * It is equivalent to new String ("11"), but the string "11" is not generated in the constant pool;
         *
         * ② s3.intern()
         * Since there is no "11" in the constant pool at this time, the address of the object recorded in s3 is stored in the constant pool
         * So s3 and s4 point to the same address
         */
        String s3 = new String("1") + new String("1");//The address of s3 variable record is: new String("11")
        //After executing the previous line of code, does "11" exist in the string constant pool? Answer: does not exist!!
        s3.intern();//Generate "11" in the string constant pool. How to understand: jdk6: if a new object "11" is really created in the constant pool, there will be a new address.
                                            //         jdk7: at this time, "11" is not really created in the constant pool, but an address pointing to the new String("11") in the heap space is created
        String s4 = "11";//Address of s4 variable record: the address of "11" generated in the constant pool during the execution of the previous line of code is used
        System.out.println(s3 == s4);//jdk6: false  jdk7/8: true
    }

}

In JDK 6

In JDK 7

Expansion: jdk8 environment

public class StringIntern1 {

    public static void main(String[] args) {
        //StringIntern. Expansion of exercises in Java:
        String s3 = new String("1") + new String("1");//new String("11")
        //After executing the previous line of code, does "11" exist in the string constant pool? Answer: does not exist!!
        String s4 = "11";//Generate object "11" in string constant pool
        String s5 = s3.intern();
        System.out.println(s3 == s4);//false
        System.out.println(s5 == s4);//true
    }

}

Summarize the use of String intern():

  • JDK1.6, try to put this string object into the string constant pool.
    • If there is in the string constant pool, it will not be put in. Returns the address of an object in an existing string constant pool
    • If not, a copy of this object will be copied into the string constant pool, and the object address in the string constant pool will be returned
  • JDK1.7, try to put this string object into the string constant pool.
    • If there is in the string constant pool, it will not be put in. Returns the address of an object in an existing string constant pool
    • If not, the reference address of the object will be copied and put into the string constant pool, and the reference address in the string constant pool will be returned

5.2.1 exercise (further understanding of different versions of JDK intern)

Exercise 1

public class StringExer1 {
    
    public static void main(String[] args) {
        String s = new String("a") + new String("b");//new String("ab")
        //After the execution of the previous line of code, there is no "ab" in the string constant pool

        String s2 = s.intern();//jdk6: create a string "ab" in the string constant pool and return the "ab" address in the string constant pool to s2
                               //jdk8: instead of creating the string "ab" in the string constant pool, create a reference to the new String("ab") and return this reference to s2

        System.out.println(s2 == "ab");//jdk6:true  jdk8:true
        System.out.println(s == "ab");//jdk6:false  jdk8:true
    }
    
}

Exercise 2

Exercise 3: jdk8 environment

public class StringExer2 {

    public static void main(String[] args) {
        String s1 = new String("a") + new String("b"); //After execution, "ab" will not be generated in the string constant pool
        s1.intern(); //At this time, the string constant pool stores references to objects in heap space
        String s2 = "ab"; //Points to the reference address in the string constant pool
        System.out.println(s1 == s2); //true
    }

}
public class StringExer2 {

    public static void main(String[] args) {
        String s1 = new String("ab");//After execution, "ab" will be generated in the string constant pool
        s1.intern(); //At this time, the object address of the string constant generated by the previous line of code is stored in the string constant pool
        String s2 = "ab"; //Points to the address of the object in the string constant pool
        System.out.println(s1 == s2); //false
    }

}

5.3 intern efficiency test: space angle

/**
 * Test execution efficiency with intern(): space usage
 */
public class StringIntern2 {
    static final int MAX_COUNT = 1000 * 10000;
    static final String[] arr = new String[MAX_COUNT];

    public static void main(String[] args) {
        Integer[] data = new Integer[]{1,2,3,4,5,6,7,8,9,10};

        long start = System.currentTimeMillis();
        for (int i = 0; i < MAX_COUNT; i++) {
//            arr[i] = new String(String.valueOf(data[i % data.length]));
            arr[i] = new String(String.valueOf(data[i % data.length])).intern();

        }
        long end = System.currentTimeMillis();
        System.out.println("The time spent is:" + (end - start));

        try {
            Thread.sleep(1000000);
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
        System.gc();
    }
}
  • Operation results
Not used intern: 7215ms
 use intern: 1542ms
  • Without using intern, more than 10 million String instance objects are generated

  • When using intern, only more than 2 million String instance objects are generated

conclusion

  • When a large number of existing strings are used in the program, especially when there are many repeated strings, using the intern() method can save memory space.
  • Large website platforms need to store a large number of strings in memory. For example, social networking sites, many people store information such as Beijing and Haidian District. At this time, if all strings call the intern() method, the memory size will be significantly reduced.

6. Garbage collection of stringtable

/**
 * String Garbage collection for:
 * -Xms15m -Xmx15m -XX:+PrintStringTableStatistics -XX:+PrintGCDetails
 */
public class StringGCTest {

    public static void main(String[] args) {
        for (int j = 0; j < 100000; j++) {
            String.valueOf(j).intern();
        }
    }

}

7. String de duplication in G1

Official website address: JEP 192: String Deduplication in G1 (java.net)

Motivation

Many large-scale Java applications are currently bottlenecked on memory. Measurements have shown that roughly 25% of the Java heap live data set in these types of applications is consumed by String objects. Further, roughly half of those String objects are duplicates, where duplicates means string1.equals(string2) is true. Having duplicate String objects on the heap is, essentially, just a waste of memory. This project will implement automatic and continuous String deduplication in the G1 garbage collector to avoid wasting memory and reduce the memory footprint.

At present, many large-scale Java applications have encountered memory bottlenecks. Measurements show that in these types of applications, about 25% of Java heap real-time data sets are consumed by String 'objects. In addition, about half of these "String" objects are repeated, where repetition means that "string1.equals(string2)" is true. Having duplicate String 'objects on the heap is essentially a waste of memory. This project will implement automatic and continuous' String 'data deduplication in G1 garbage collector to avoid wasting memory and reduce memory occupation.

Note that the repetition here refers to the data in the heap, not in the constant pool, because the data in the constant pool itself will not be repeated

Background: tests on many Java applications (large and small) have yielded the following results:

  • string objects account for 25% of the heap data set
  • There are 13.5% duplicate string objects in the heap data set
  • The average length of a string object is 45

The bottleneck of many large-scale Java applications lies in memory. Tests show that in these types of applications, almost 25% of the data sets surviving in the Java heap are string objects. Furthermore, almost half of the string objects are repeated, which means: string1 equals(string2) == true. The existence of duplicate string objects on the heap must be a waste of memory. This project will implement automatic and continuous de duplication of duplicate string objects in G1 garbage collector, so as to avoid wasting memory.

realization

  • When the garbage collector works, it accesses the objects that live on the heap. For each accessed object, it will check whether it is a candidate String object to be de duplicated
  • If so, insert a reference of this object into the queue and wait for subsequent processing. A de duplication thread runs in the background to process the queue. Processing an element of a queue means deleting the element from the queue and then trying to recreate the string object it references.
  • Use a hashtable to record all non repeating char arrays used by String objects. When you go as like as two peas, you will look at this hashtable to see if there is a char array on the heap.
  • If it exists, the String object will be adjusted to refer to that array, release the reference to the original array, and finally be recycled by the garbage collector.
  • If the search fails, the char array will be inserted into the hashtable so that the array can be shared later.

Command line options

# Enable String de duplication. It is not enabled by default and needs to be enabled manually. 
UseStringDeduplication(bool)  
# Print detailed de duplication statistics 
PrintStringDeduplicationStatistics(bool)  
# String objects that reach this age are considered candidates for de duplication
StringpeDuplicationAgeThreshold(uintx)

Topics: Java jvm Back-end