HashMap source code analysis and thread safety

Posted by JesperBisgaard on Wed, 10 Nov 2021 07:47:40 +0100

Storage structure - fields

// Hash bucket is used to store linked lists
// Initialize when first used instead of when defined. The length is 2^n
transient Node<K,V>[] table;

// Save cached entrySet
transient Set<Map.Entry<K,V>> entrySet;

// Number of key value pairs owned by the current map
transient int size;

// Record the number of times the internal structure of Hash Map has been changed (e.g. re hashing) for concurrent exceptions
transient int modCount;

// threshold
int threshold;

// Load factor
final float loadFactor;

1. Construction method

// Construction method
public HashMap(int initialCapacity, float loadFactor) {
    if (initialCapacity < 0)
        throw new IllegalArgumentException("Illegal initial capacity: " +
                                           initialCapacity);
    // The maximum initial capacity shall not exceed 2 ^ 30
    if (initialCapacity > MAXIMUM_CAPACITY)
        initialCapacity = MAXIMUM_CAPACITY;
    if (loadFactor <= 0 || Float.isNaN(loadFactor))
        throw new IllegalArgumentException("Illegal load factor: " +
                                           loadFactor);
    this.loadFactor = loadFactor;
    // Set the threshold and find the exponential power of 2 > = initialCapacity
    this.threshold = tableSizeFor(initialCapacity);
}
public HashMap(int initialCapacity) {
    this(initialCapacity, DEFAULT_LOAD_FACTOR);
}
public HashMap() {
    this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
}
public HashMap(Map<? extends K, ? extends V> m) {
    this.loadFactor = DEFAULT_LOAD_FACTOR;
    putMapEntries(m, false);
}

The construction method in HashMap has four overloads, which are responsible for initializing three parameters

initialCapacity initial capacity (default 16)

The bottom layer of HashMap is realized by array + linked list (turn to red black tree when it exceeds 8), and the initial capacity represents the initial size of the array;
loadFactor load factor (default 0.75)

loadFactor = number of currently loaded elements / total capacity, used to calculate the maximum load of hashmap;
Threshold threshold

The maximum number of key value pairs that can be accommodated by HashMap. If it exceeds, it needs to be expanded. The calculation method = initialCapacity * loadFactor;

Here is an interesting method, tableSizeFor(), which uses the bit operation of unsigned right shift to round the initial capacity up to the exponential power of 2 (for example, enter 10 and return 2 ^ 4 = 16) Here's a good analysis

2. New method

There are two new methods in hashmap:

put(K key, V value)

Most commonly, the bottom layer directly calls the putval function below
putMapEntries(Map<? extends K, ? extends V> m, boolean evict)

The bottom layer of this method also calls putVal()

There are three main calls to this method:
1. When initializing a map with another map
2. putAll()
3. clone()

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
               boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    // 1. If the hash table is not initialized, initialize it first
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    // 2.1 the hash algorithm finds the corresponding bucket. If there is no data in the bucket, the data will be directly added to the bucket
    if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);
    // 2.2 the hash algorithm finds the corresponding bucket. If there is data in the bucket, hash collision occurs
    else {
        Node<K,V> e; K k;
        // 3.1 the same key already exists. Handle it uniformly in step 4
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
            e = p;
        // 3.2 the linked list in the bucket has been converted to a red black tree, and the red black tree is directly used for insertion
        else if (p instanceof TreeNode)
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        // 3.3 the bucket is inserted after traversing the linked list backward to find a null value
        else {
            for (int binCount = 0; ; ++binCount) {
                if ((e = p.next) == null) {
                    p.next = newNode(hash, key, value, null);
                    // If the length of the linked list exceeds the treelization threshold, the linked list will be transformed into a red black tree
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash);
                    break;
                }
                // If the same key already exists, handle it uniformly in step 4
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    break;
                p = e;
            }
        }
        // 4. If e is not empty, it means that the newly inserted key already exists in the original hash bucket, and the original key value pair can be updated
        if (e != null) { // existing mapping for key
            V oldValue = e.value;
            if (!onlyIfAbsent || oldValue == null)
                e.value = value;
            afterNodeAccess(e);
            return oldValue;
        }
    }
    ++modCount;
    // If size > threshold after insertion, expand the capacity
    if (++size > threshold)
        resize();
    afterNodeInsertion(evict);
    return null;
}

🚀 3. Implementation of capacity expansion mechanism

In the new process, a key function called resize() is called, which has two main functions.

Initialize the hash bucket size when HashMap is not initialized
Expand (double) hash bucket size

Resizing is to recalculate the capacity and constantly add elements to the HashMap object. When the length of the array inside the HashMap object is greater than default_ LOAD_ FACTOR * DEFAULT_ INITIAL_ For capability, HashMap needs to expand the length of the array so that more elements can be loaded. The method is to use a new array to replace the existing array with small capacity, and then transfer all the data in the original array to the new array.

final Node<K,V>[] resize() {
    Node<K,V>[] oldTab = table;
    int oldCap = (oldTab == null) ? 0 : oldTab.length;
    int oldThr = threshold;
    int newCap, newThr = 0;
    // Calculate the capacity after expansion
    if (oldCap > 0) {
        // If the current capacity has exceeded the maximum capacity, the current hash bucket will not be returned during capacity expansion
        if (oldCap >= MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return oldTab;
        }
        else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                 oldCap >= DEFAULT_INITIAL_CAPACITY)
            newThr = oldThr << 1; // double threshold
    }
    // If the current Map is uninitialized (oldCap=0) but the threshold is not empty, it indicates that the capacity and threshold are specified during definition
    else if (oldThr > 0) // initial capacity was placed in threshold
        newCap = oldThr;
    // If the current Map is uninitialized (oldCap=0) and the threshold value is empty, it means that the default value is not specified during initialization
    else {               // zero initial threshold signifies using defaults
        newCap = DEFAULT_INITIAL_CAPACITY;
        newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
    }
    // If the threshold is not set, the map is uninitialized, but the capacity and threshold have specified values
    if (newThr == 0) {
        float ft = (float)newCap * loadFactor;
        newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                  (int)ft : Integer.MAX_VALUE);
    }
    threshold = newThr;
    @SuppressWarnings({"rawtypes","unchecked"})
    Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
    table = newTab;
    // ===========Capacity expansion: transfer all nodes in the current hash bucket to the new hash bucket===========
    if (oldTab != null) {
        // Move each bucket to a new bucket
        for (int j = 0; j < oldCap; ++j) {
            Node<K,V> e;
            if ((e = oldTab[j]) != null) {
                // Release the object reference in the old array
                oldTab[j] = null;
                if (e.next == null)
                    newTab[e.hash & (newCap - 1)] = e;
                else if (e instanceof TreeNode)
                    ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                // Rehash process: linked list optimization rehash code block
                else { // preserve order
                    Node<K,V> loHead = null, loTail = null;
                    Node<K,V> hiHead = null, hiTail = null;
                    Node<K,V> next;
                    do {
                        next = e.next;
                        // Original index: the hash bucket is distinguished according to whether the hash value added after capacity expansion is 0 or 1
                        if ((e.hash & oldCap) == 0) {
                            if (loTail == null)
                                loHead = e;
                            else
                                loTail.next = e;
                            loTail = e;
                        }
                        // Original index + oldCap
                        else {
                            if (hiTail == null)
                                hiHead = e;
                            else
                                hiTail.next = e;
                            hiTail = e;
                        }
                    } while ((e = next) != null);
                    // Put the original index into the bucket
                    if (loTail != null) {
                        loTail.next = null;
                        newTab[j] = loHead;
                    }
                    // Put the original index + oldCap into the bucket
                    if (hiTail != null) {
                        hiTail.next = null;
                        newTab[j + oldCap] = hiHead;
                    }
                }
            }
        }
    }
    return newTab;
}

Optimization of JDK1.8

In JDK1.7, the resize process adopts the header insertion method. The specific process can be found through this Code: e.next = newTable[i]; Assigning the value of the original array to the next of the new node can prove to be a header insertion method, which also leads to the inversion of the relative position of the data in each bucket when resizing each expansion;

See the following figure for specific operation:

JDK1.8 makes many optimizations in the capacity expansion function. Since each capacity expansion is a power-2 expansion, the position of each element after capacity expansion is in the original hash bucket or the original hash bucket + 2^n-1 (i.e. the original index + the length of the original hash bucket)

According to the above properties, we can determine whether the new hash bucket is 0 or 1 according to the new bit of the hash value (for example, the last 1-bit hash value is used to locate before capacity expansion, and the last 2-bit hash value is used to locate after capacity expansion). If it is 0, the index remains unchanged, and if it is 1, the index becomes "original index+oldCap", Instead of recalculating the hash value every time according to the new capacity after capacity expansion, as in JDK1.7;

At the same time, the resize of JDK1.8 also avoids the change of the relative position of data

Thread safety

In the multithreading scenario, try to avoid using thread unsafe HashMap, and use thread safe ConcurrentHashMap, or wrap it with Collections.synchronizedMap(new HashMap(...));

The main problem of thread insecurity in JDK1.7 is that the reverse insertion method is used during capacity expansion, and dead cycles may occur during concurrency, resulting in 100% CPU occupation. However, JDK1.8 uses two linked lists to operate separately, ensuring that the linked list is inserted in sequence when it is inserted into the Map to avoid dead cycles;

The specific causes in JDK1.7 are analyzed as follows (transferred from Left ear mouse (my blog)

1. There are two concurrent threads

The core function of resize in JDK1.7 is used to transfer data from the old hash bucket to the new hash bucket

do {
    Entry<K,V> next = e.next; // < -- suppose that the thread is suspended by scheduling as soon as it executes here
    int i = indexFor(e.hash, newCapacity);
    e.next = newTable[i];
    newTable[i] = e;
    e = next;
} while (e != null);

The execution of thread 1 is suspended at the above code, and the execution of thread 2 is completed. Therefore, there are the following situations, and the hash table in thread 2 is expanded:

Note that e of Thread1 points to key(3) and next points to key(7). After rehash of thread 2, it points to the list reorganized by thread 2. We can see that the order of the linked list is reversed.

2. The thread is scheduled for execution

First, execute newTalbe[i] = e; (the pointer of hash bucket index=3 points to key3)
Then e = next, resulting in e pointing to key(7),
next = e.next in the next cycle causes next to point to key(3)

3. Everything is fine

The thread continues to work. Take off the key(7), put it in the first one of the newTable[i], and then move e and next down.

4. Infinite Loop appears in the ring link

e.next = newTable[i] causes key(3).next points to key(7)

Note: at this time, the key(7).next has pointed to key(3), and the ring linked list appears.

Therefore, as soon as our thread is scheduled to HashMap.get(11) again, we will encounter Infinite Loop again

4. Find

The search and deletion logic are similar and relatively simple. The following only takes the search as an example to show the source code and comments;

Some common search related API s are implemented with the following functions: eg: get(), containsKey()

// The hash value of the incoming key and the target key
final Node<K,V> getNode(int hash, Object key) {
    Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
    if ((tab = table) != null && (n = tab.length) > 0 &&
        // 1. Find the first data in the corresponding hash bucket according to the hash algorithm
        (first = tab[(n - 1) & hash]) != null) {
        // 2. Judge whether it is the specified key. If it is a key, return directly
        if (first.hash == hash && // always check first node
            ((k = first.key) == key || (key != null && key.equals(k))))
            return first;
        // 3. The header node is not found, indicating that it is in the subsequent nodes of the header node
        if ((e = first.next) != null) {
            // 3.1 if it has been converted to a red black tree, search through the tree
            if (first instanceof TreeNode)
                return ((TreeNode<K,V>)first).getTreeNode(hash, key);
            // 3.2 if it is a linked list traversal search
            do {
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    return e;
            } while ((e = e.next) != null);
        }
    }
    return null;
}

Programmer Think