Java Foundation 1-String Details

Posted by gabe on Tue, 23 Jul 2019 11:08:19 +0200

overview

1. Class declarations

String is declared final, so it cannot be inherited.

Before Java 8, char arrays were used internally to store data.

public final class String
    implements java.io.Serializable, Comparable<String>, CharSequence {
    /** The value is used for character storage. */
    private final char value[];
}

In Java 9 and beyond, String class implementations used byte arrays to store strings, while using coder to identify which character set encoding was used.

public final class String
    implements java.io.Serializable, Comparable<String>, CharSequence {
    /** The value is used for character storage. */
    private final byte[] value;

    /** The identifier of the encoding used to encode the bytes in {@code value}. */
    private final byte coder;
}

2. Constructor

  • Empty parametric structure
   /**
     * final The declared value array cannot modify its reference, so the value attribute must be initialized in the constructor
     */
public String() {
        this.value = "".value;
    }
  • Construct with a String
  /**
     * Unless you explicitly need a copy of the original string
     */
    public String(String original) {
        this.value = original.value;
        this.hash = original.hash;
    }
  • Construct with char arrays
 public String(char value[], int offset, int count) {
        if (offset < 0) {
            throw new StringIndexOutOfBoundsException(offset);
        }
        if (count <= 0) {
            if (count < 0) {
                throw new StringIndexOutOfBoundsException(count);
            }
            if (offset <= value.length) {
                this.value = "".value;
                return;
            }
        }
        // Note: offset or count might be near -1>>>1.
        if (offset > value.length - count) {
            throw new StringIndexOutOfBoundsException(offset + count);
        }
        this.value = Arrays.copyOfRange(value, offset, offset+count);
    }
  • Construct with byte []
/**
     * Construct a string produced by byte [], using the system default character set encoding
     * The length of the new array does not necessarily equal the length of the array
     * This constructor is invalid if the default character set encoding is not available.
     */
    public String(byte bytes[], int offset, int length) {
        checkBounds(bytes, offset, length);
        this.value = StringCoding.decode(bytes, offset, length);
    }
  • Construct with Unicode-coded int[]
/**
     * Initialize strings using Unicode-encoded int arrays
     * Modifications to parameter groups do not affect newly created String
     * @since  1.5
     */
    public String(int[] codePoints, int offset, int count) {
        if (offset < 0) {
            throw new StringIndexOutOfBoundsException(offset);
        }
        if (count <= 0) {
            if (count < 0) {
                throw new StringIndexOutOfBoundsException(count);
            }
            //count = 0
            if (offset <= codePoints.length) {
                this.value = "".value;
                return;
            }
        }
        // Note: offset or count might be near -1>>>1.
        if (offset > codePoints.length - count) {
            throw new StringIndexOutOfBoundsException(offset + count);
        }

        final int end = offset + count;

        // Pass 1: Compute precise size of char[]
        int n = count;
        for (int i = offset; i < end; i++) {
            int c = codePoints[i];
            // Character sets from U+0000 to U+FFFF FF are sometimes referred to as Basic Multilingual surfaces.
            // You can use a single char to represent such code points
            if (Character.isBmpCodePoint(c))
                continue;
            // Verify that c is not
            else if (Character.isValidCodePoint(c))
                n++;
            else throw new IllegalArgumentException(Integer.toString(c));
        }

        // Pass 2: Allocate and fill in char[]
        // Get the number of characters that can be converted into valid characters
        final char[] v = new char[n];

        for (int i = offset, j = 0; i < end; i++, j++) {
            int c = codePoints[i];
            if (Character.isBmpCodePoint(c))
                v[j] = (char)c;
            else
                Character.toSurrogates(c, v, j++);
        }

        this.value = v;
    }
}
  • Construct with variable length strings StringBuffer,StringBuilder
public String(StringBuffer buffer) {
    synchronized(buffer) {
        this.value = Arrays.copyOf(buffer.getValue(), buffer.length());
    }
}

public String(StringBuilder builder) {
    this.value = Arrays.copyOf(builder.getValue(), builder.length());
}

3. Common api

Method List:

boolean isEmpty() //Returns true if and only if length() is 0
int length() //Returns the length of this string
boolean contains(CharSequence s) //Returns true if and only if the string contains the specified sequence of char values
char charAt(int index) //Returns the char value at the specified index
String concat(String str) //Connect the specified string to the end of this string

int indexOf(int ch) //Returns the index of the first occurrence of the specified character in this string
int lastIndexOf(int ch) //Returns the index of the last occurrence of the specified character in this string    

String substring(int beginIndex, int endIndex) //Returns a new string, which is a substring of the string
CharSequence subSequence(int beginIndex, int endIndex) //Returns a new character sequence, which is a subsequence of the sequence.
   
int compareTo(String anotherString) //Compare two strings in dictionary order

int compareToIgnoreCase(String str) //Compare two strings in dictionary order, regardless of case
boolean equalsIgnoreCase(String anotherString) //Compare this String with another String, regardless of case

static String valueOf(double d)  
static String valueOf(boolean b) 

byte[] getBytes(Charset charset) //Use a given charset to encode the String into a byte sequence and store the results in a new byte array
byte[] getBytes(String charsetName) //Encoding this String as a byte sequence using the specified character set and storing the results in a new byte array
     
String toLowerCase(Locale locale) //Convert all characters in this String to lowercase using the rules for a given Locale  
String toUpperCase(Locale locale)
    
boolean matches(String regex) //Tell this string whether it matches a given regular expression
String[] split(String regex, int limit) //Split the string by matching a given regular expression

boolean startsWith(String prefix, int toffset) //Test whether the substring of this string starting with the specified index starts with the specified prefix
boolean endsWith(String suffix)
    
static String copyValueOf(char[] data)//Returns the character sequence represented in the specified array
char[] toCharArray() //Convert this string to a new character array
    
String replace(char oldChar, char newChar) //Returns a new string obtained by replacing all oldChar s that appear in the string with newChar
String replaceAll(String regex, String replacement) //Replace all substrings of this string that match a given regular expression with a given replacement
    
String intern() //Returns the normalized representation of string objects. The existence of string pools returns, and there is no pool stored and returned.
String trim()//Returns a copy of the string, ignoring leading and tail blanks
    

static String format(Locale l, String format, Object... args) //Returns a formatted string using the specified language environment, format string, and parameters        

4. Unmodifiable features

Why Not Modify

The following two points ensure that String is not modifiable

  1. Value is declared final, meaning that the address referenced by value is not modifiable.
  2. The String class does not expose methods for modifying the value reference content.
Non-modifiable advantages

From the perspective of memory, synchronization and data structure:

  1. Requirement of String Pool: String internal pool is a special storage area in the method area. When a string is created and already exists in the pool, a reference to an existing string is returned instead of creating a new object. If the string is variable, it makes no sense.
  2. Caching Hashcode: hashcode is frequently used in java and has attributes in String classes
    private int hash;//this is used to cache hash code.
  3. Facilitating the Use of Other Objects: Ensure third-party use. For example:

    //Assume String.class has attribute value;
    //The idea of set is to ensure that elements do not recur, and if String is mutable, it breaks the rule.
    HashSet<String> set = new HashSet<String>();
    set.add(new String("a"));
    set.add(new String("b"));
    set.add(new String("c"));
     
    for(String a: set)
        a.value = "a";
  4. Security: String is widely used as a parameter for many java classes, such as network connection, opening files, etc. Strings are not immutable, links or files will be changed, which may lead to serious security threats. This method assumes that it is connected to a machine, but this is not the case. Variable strings can also cause security problems in Reflection because the parameters are strings. Example:

    boolean connect(string s){
        if (!isSecure(s)) { 
    throw new SecurityException(); 
    }
        //here will cause problem, if s is changed before this by using other references.    
        causeProblem(s);
    }
  5. Immutable objects are naturally thread-safe: Since immutable objects cannot be changed, they can be freely shared among multiple threads. This eliminates the need for synchronization.

In short, String is designed to be immutable for efficiency and security reasons. This is also the reason why invariant classes are generally preferred in some cases.

5. String pool

What is a pool?

There are eight basic types and one special type String in JAVA language. These types provide a concept of constant pool in order to make them run faster and save more memory. Constant pools are similar to caches provided at the JAVA system level. The eight basic types of constant pools are all system-coordinated, and the String type of constant pools is special. There are two main ways to use it:

  • String objects declared directly using double quotes are stored directly in the constant pool
  • If the String object is not declared in double quotes, you can use the intern method provided by String. The intern method queries the current string from the string constant pool, and if it does not exist, it puts the current string into the constant pool.
  • In jdk6 and previous versions, the constant pool of strings is placed in the Perm area of the heap (Perm area is a static area of class, which mainly stores information about loading classes, constant pool, method fragments, etc. The default size is only 4m). Once intern is used in the constant pool, it will directly generate java.lang.OutOfMemor. YError: PermGen space is wrong.
  • In jdk7, the pool of string constants has moved from the Perm area to the normal Java Heap area.
String intern method

Its general implementation structure is: JAVA uses jni to call StringTable's internal method implemented by c++, StringTable's internal method is similar to HashMap's implementation in Java, but it can't be automatically expanded. The default size is 1009

Note:

  1. String's String Pool is a fixed size Hashtable with a default size of 1009
  2. If there are too many Strings in String Pool, there will be a serious Hash conflict, which will result in a long list and a significant performance degradation when String.intern is called (because one by one is needed).
  3. In jdk6, the length of StringTable is fixed = 1009, so if there are too many strings in the constant pool, the efficiency will decrease rapidly. In jdk7, the length of a StringTable can be specified by one parameter: - XX:StringTableSize=99991
Reflections on Examples
// Execution in JDK6: false false
// Execution in JDK7: false true
public static void main(String[] args) {
    // Declared character creation variable - > heap
    String s = new String("1");
    s.intern();
    // The declared character creation constant - > Perm area of the heap
    String s2 = "1";
    System.out.println(s == s2);

    String s3 = new String("1") + new String("1");
    s3.intern();
    String s4 = "11";
    System.out.println(s3 == s4);
}

// Execution in JDK6: false false
// Execution in JDK7: false false
public static void main(String[] args) {
    String s = new String("1");
    String s2 = "1";
    s.intern();
    System.out.println(s == s2);

    String s3 = new String("1") + new String("1");
    String s4 = "11";
    s3.intern();
    System.out.println(s3 == s4);
}
  • jdk6 memory analysis (Note: The green line in the figure represents the content pointing of the string object. Black lines represent address pointing)

    • String s = new String ("1"); constant pool in Perm generates "1" and variable s content in heap = "1"
    • "1" in S2 - > constant pool
    • String s3 = new String ("1") + new String ("1"); constant pool in Perm generates "1" and two anonymous string contents in the heap= "1" and variable s3
    • s.intern(); writes "11" to the constant pool
  • jdk7 memory Analysis-1

    • In the first piece of code, look at the s3 and s4 strings first. String s3 = new String("1") + new String("1"); this code now generates two final objects, which are "1" in the string constant pool and s3 references in JAVA Heap. There are also two anonymous new Strings ("1") that we will not discuss. At this point, s3 refers to the object content as "11", but there is no "11" object in the constant pool.
    • Next, s3.intern(); this code is to put the "11" string in S3 into the String constant pool, because there is no "11" string in the constant pool at this time, so the conventional approach is to generate an "11" object in the constant pool, as shown in the jdk6 graph, and the key point is that the constant pool in jdk7 is not Perm. The area has been adjusted. There is no need to store another object in the constant pool, so the reference in the heap can be stored directly. This reference points to the object referenced by s3. That is to say, the reference address is the same.
    • Finally, String s4 = 11; in this code, "11" is declared, so it will be created directly in the constant pool, and when created, it will be found that the object already exists, which is a reference to the S3 reference object. So the s4 reference points to the same thing as s3. So the final comparison S3 = s4 is true.
    • Look again at the s and s2 objects. String s = new String("1"); the first line of code generates two objects. "1" in the constant pool and string objects in JAVA Heap. s.intern(); This sentence is that the s object is searched in the constant pool and found that "1" is already in the constant pool.
    • Next, String s2 = 1; this code generates a s2 reference to the "1" object in the constant pool. The result is that the reference addresses of s and s2 are significantly different. The picture is very clear.
  • jdk7 memory analysis-2

    • Look at the second piece of code. Look at the second picture above. The first and second code changes are s3.intern(); in the order String s4 = 11; and later. In this way, String s4 = 11 is executed first; there is no "11" object in the constant pool when s4 is declared, and after execution, "11" object is a new object generated by s4 declaration. Then s3.intern(); when the "11" object already exists in the constant pool, the references to S3 and s4 are different.
    • s.intern() in the second code and in the s2 code; it doesn't matter if the sentence is put back, because the first code String s = new String("1") is executed in the object pool, and the "1" object is already generated. The s2 declarations below are all referenced directly from the constant pool. The reference addresses of S and s2 are not equal.
  • Summary - From the above example code, we can see that the jdk7 version has made some repairs to the intern operation and constant pool. It mainly includes two points:

    • Move the String constant pool from the Perm area to the Java Heap area
    • When the String intern method exists, if there are objects in the heap, the reference to the object will be saved directly, but the object will not be recreated.

Reference resources:

https://tech.meituan.com/2014...

Topics: Java encoding Attribute network