Analysis and implementation principle of HashMap source code of JDK8

Posted by randomfool on Thu, 20 Jan 2022 20:03:01 +0100

Analysis of HashMap source code

Implementation of Map interface based on hash table.

This article refers to the connection

Official website explanation

Analysis

Key one

Empty HashMap with default initial capacity (16) and default load factor (0.75), maximum capacity, used when either of the two parameterized constructors implicitly specifies a higher value. Must be a power of 2 < = 1 < < 30 (1073741824).

Constructor

/**
*Maximum capacity, used when either of the two parameterized constructors implicitly specifies a higher value. Must be a power of 2 < = 1 < < 30.
*/ 
static final int MAXIMUM_CAPACITY = 1 << 30;
/**
 * Construct an empty HashMap with a specified initial capacity and load factor.
 * Parameters:
 *
 * initialCapacity – Initial capacity
 * loadFactor – Load factor
 * Throw:
 * IllegalArgumentException – If the initial capacity is negative or the load factor is non positive
 */
public HashMap(int initialCapacity, float loadFactor) {
    //If the initial capacity is less than 0, an exception is thrown
    if (initialCapacity < 0)
        throw new IllegalArgumentException("Illegal initial capacity: " +
                                           initialCapacity);
    //Maximum initial capacity_ CAPACITY
    if (initialCapacity > MAXIMUM_CAPACITY)
        initialCapacity = MAXIMUM_CAPACITY;
    if (loadFactor <= 0 || Float.isNaN(loadFactor))
        throw new IllegalArgumentException("Illegal load factor: " +
                                           loadFactor);
    this.loadFactor = loadFactor;
    this.threshold = tableSizeFor(initialCapacity);
}

/**
 * It is realized that a number becomes the nearest n-th power of 2
 * If A is passed in, when A is greater than 0 and less than the defined maximum capacity,
 * If A is A power of 2, A is returned. Otherwise, A is converted to A power of 2 with the smallest difference than A.  
 * For example, incoming 7 returns 8, incoming 8 returns 8, and incoming 9 returns 16
 * cap=7 The code logic is as follows
 * n|=n Represents or operates on 0, the corresponding 0 is 0, otherwise 1
 */
static final int tableSizeFor(int cap) {
    int n = cap - 1;
    n |= n >>> 1; //n|(n>>>1)  6|6>>>1   00000110|00000110>>>1  00000110|00000011 =00000111  n=7
    n |= n >>> 2; //n|(n>>>2)  7|7>>>2   00000111|00000111>>>2  00000111|00000001 =00000111  n=7
    n |= n >>> 4; //n|(n>>>4)  7|7>>>4   00000111|00000111>>>4  00000111|00000000 =00000111  n=7
    n |= n >>> 8; //n|(n>>>8)  7|7>>>8   00000111|00000111>>>8  00000111|00000000 =00000111  n=7
    n |= n >>> 16;//n|(n>>>16) 7|7>>>16  00000111|00000111>>>16 00000111|00000000 =00000111  n=7
    //(n < 0) =false
    //(n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1
    //(n >= 1<<30) n>=1073741824 = false
    //false n+1 = 8
    return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}

Core function put function

put function
/**
 * put function
 */
public V put(K key, V value) {
    return putVal(hash(key), key, value, false, true);
}

/**
 * Get hash value
 */
static final int hash(Object key) {
    int h;
    //^XOR operation: set 0 for the same and 1 for different
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}
Why shift 16 bits to the right?
  • Ensure that the high 16 bits also participate in the calculation, until int accounts for 4 bytes and 32 bits, and 16 is the median
  • Because in most cases, the lower 16 bits participate in the operation, and the higher 16 bits can reduce hash conflicts
putVal function
/**
 * Table, initialized when first used, and resized as needed. When allocating, the length is always a power of 2. (we also tolerate zero length in some operations to allow boot mechanisms that are not currently needed.)
 */
transient Node<K,V>[] table;

/**
 * Use the bin count threshold of the tree instead of the list. When you add an element to a bin with at least so many nodes, the bin is converted to a tree. The value must be greater than 2 and at least 8 to match the assumption in tree removal to convert back to normal bin when shrinking.
 */
static final int TREEIFY_THRESHOLD = 8;

/**
 * Create a new node
 */
Node<K,V> newNode(int hash, K key, V value, Node<K,V> next) {
    return new Node<>(hash, key, value, next);
}

/**
 * Parameters:
 *   hash – Hash of key
 *   value – Value to place
 *   onlyIfAbsent – If true, the existing value is not changed
 *   evict – If false, the table is in create mode.
 * return:
 *   Previous value, null if none
 */
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    //1. If the trunk tab is equal to null or the tab length is 0, call the resize() method to get the length.
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    //2. The array length is compared with the calculated hash
    if ((p = tab[i = (n - 1) & hash]) == null)//If the position is empty, the i position value is assigned to a new node object
        tab[i] = newNode(hash, key, value, null);
    else {//3. Location is not empty
        Node<K,V> e; K k;
        if (p.hash == hash &&//4. p the old node is the same as the newly added element
            ((k = p.key) == key || (key != null && key.equals(k))))
            e = p;//Assign the old node to the new node
        else if (p instanceof TreeNode)//5. If p is already an instance of a tree node, it is already a tree
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        else {//p the old node is completely different from the e new node, and p is not a tree node tree node instance
            for (int binCount = 0; ; ++binCount) {//Dead cycle
                if ((e = p.next) == null) {//e new node = p old node Next the next node is equal to null
                    p.next = newNode(hash, key, value, null);//Then assign a new node
                    if (binCount >= TREEIFY_THRESHOLD - 1) // If the length of the linked list is greater than or equal to 8
                        treeifyBin(tab, hash);//Turn the linked list into a red black tree
                    break;//Jump out of loop
                }
                //If the elements in the linked list are exactly the same as the newly added elements during traversal, the loop will jump out
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    break;//Jump out of loop
                p = e;//Assign the traversal node element to the new node
            }
        }
        if (e != null) { //The function of the code in this judgment is: if the added element has a hash conflict, call
            V oldValue = e.value;
            if (!onlyIfAbsent || oldValue == null)
                e.value = value;
            afterNodeAccess(e);
            return oldValue;
        }
    }
    ++modCount;
    if (++size > threshold)//If the number of elements is greater than the critical value, expand the capacity
        resize();
    afterNodeInsertion(evict);
    return null;
}
Why consider red and black trees?

If the linked list is too long, the red black tree is used to improve the search efficiency.

Why is HashMap linked list to red black tree 8?

The source code is also explained.

     * Because TreeNodes are about twice the size of regular nodes, we
     * use them only when bins contain enough nodes to warrant use
     * (see TREEIFY_THRESHOLD). And when they become too small (due to
     * removal or resizing) they are converted back to plain bins.  In
     * usages with well-distributed user hashCodes, tree bins are
     * rarely used.  Ideally, under random hashCodes, the frequency of
     * nodes in bins follows a Poisson distribution
     * (http://en.wikipedia.org/wiki/Poisson_distribution) with a
     * parameter of about 0.5 on average for the default resizing
     * threshold of 0.75, although with a large variance because of
     * resizing granularity. Ignoring variance, the expected
     * occurrences of list size k are (exp(-0.5) * pow(0.5, k) /
     * factorial(k)). The first values are:
     *
     * 0:    0.60653066
     * 1:    0.30326533
     * 2:    0.07581633
     * 3:    0.01263606
     * 4:    0.00157952
     * 5:    0.00015795
     * 6:    0.00001316
     * 7:    0.00000094
     * 8:    0.00000006
     * more: less than 1 in ten million

The above paragraph means that if the hashCode is well distributed, that is, the hash calculation results are well dispersed, the form of red black tree is rarely used, because the values are evenly distributed and the linked list is rarely very long. Ideally, the length of the linked list conforms to the Poisson distribution, and the hit probability of each length decreases in turn. When the length is 8, the probability is only 0.00000006. This is a probability of less than one in ten million. Usually, our Map will not store so much data, so generally, the conversion from linked list to red black tree will not occur.
This problem refers to the connection

resize capacity expansion function (detailed source code)
 
 //Idempotent of 2
 static final int DEFAULT_INITIAL_CAPACITY = 1 << 4;

 //The next size value to resize (capacity * load factor).
 int threshold;
 
 final Node<K,V>[] resize() {
        //Assign trunk table to oldTab   
        Node<K,V>[] oldTab = table;
        //Gets the capacity of the original hash table. If the hash table is empty, the capacity is 0. Otherwise, it is the length of the original hash table
        int oldCap = (oldTab == null) ? 0 : oldTab.length;
        //Obtain the native expansion standard (capacity 16 * load factor 0.75)
        int oldThr = threshold;
        //The threshold for initializing new capacity and new capacity expansion is 0
        int newCap, newThr = 0;
        if (oldCap > 0) {
            if (oldCap >= MAXIMUM_CAPACITY) {
                threshold = Integer.MAX_VALUE;//When the capacity exceeds the maximum, the critical value is set to int maximum
                return oldTab;
            }
            else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY && oldCap >= DEFAULT_INITIAL_CAPACITY)//If it does not reach 1 < < 30, expand the capacity
                newThr = oldThr << 1; // Capacity expanded to 2x
        }
        else if (oldThr > 0) //Do not execute
            newCap = oldThr;
        else {               // Zero initial threshold indicates that the default value is used
            newCap = DEFAULT_INITIAL_CAPACITY;
            newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
        }
        if (newThr == 0) {//Do not execute
            float ft = (float)newCap * loadFactor;
            newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                      (int)ft : Integer.MAX_VALUE);
        }
        threshold = newThr;//Assign a new threshold value to threshold
        @SuppressWarnings({"rawtypes","unchecked"})
            Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
        table = newTab;//Assign a new array to table
         //After capacity expansion, recalculate the new location of the element
        if (oldTab != null) {
            //Recycle old capacity
            for (int j = 0; j < oldCap; ++j) {
                Node<K,V> e;
                if ((e = oldTab[j]) != null) {
                    oldTab[j] = null;
                    if (e.next == null)
                        newTab[e.hash & (newCap - 1)] = e;
                    else if (e instanceof TreeNode)
                        //The node corresponding to the current index is a red black tree. When the number of trees is less than or equal to UNTREEIFY_THRESHOLD is converted into a linked list
                        ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                    else { // preserve order
                        //It is divided into two linked lists to reduce the migration volume of capacity expansion
                         //loHead, the chain header when the subscript remains unchanged
                         //loTail, the end of the linked list when the subscript remains unchanged
                         //hiHead, chain header in case of subscript change
                         //hiTail, the end of the linked list when the subscript changes
                        Node<K,V> loHead = null, loTail = null;
                        Node<K,V> hiHead = null, hiTail = null;
                        Node<K,V> next;
                        do {
                            next = e.next;
                            if ((e.hash & oldCap) == 0) {//Hash subscript unchanged
                                if (loTail == null)
                                    loHead = e;//Set chain head
                                else
                                    loTail.next = e;
                                loTail = e;//Set chain tail
                            }
                            else {//Hash subscript change
                                if (hiTail == null)
                                    hiHead = e;//Set chain head
                                else
                                    hiTail.next = e;
                                hiTail = e;//Set chain tail
                            }
                        } while ((e = next) != null);
                        if (loTail != null) {
                            loTail.next = null;
                            newTab[j] = loHead;
                        }
                        if (hiTail != null) {
                            hiTail.next = null;
                            // The expansion length is the current index position + the old capacity
                            newTab[j + oldCap] = hiHead;
                        }
                    }
                }
            }
        }
        return newTab;
    }
When will the capacity be expanded?

When the number of elements in the hashMap exceeds [array length (capacity) * localFactor], the array will be expanded.

  • localFactor defaults to 0.75
  • The capacity is 16 by default, that is, 0.7516 = 12. If it exceeds 12, the array size will be expanded to 212 = 32, doubling the expansion. Then recalculate the position of each element in the array.
Capacity expansion restriction mechanism?
  • The limited expansion size cannot be greater than 1 < < 30 (1073741824), with a minimum of 16.
  • The expansion multiple is the power closest to 2. For example, the new HashMap(13) will still be 16 in length.
step

1. Defines the original table length of oldCap, the new table length of newCap, and the new cap is twice that of oldCap.
2. Loop the original table, get the elements on the chain and store them in the new table
3. Calculate the new and old subscript results, either the same, or the new subscript = old subscript + old subscript array length.

Is hashMap inserted or expanded first?

1. The initial capacity is expanded before inserting, and the subsequent capacity is expanded after inserting, because resize() will compare the old and new table s.

Combined with the source code, why does HashMap have an endless loop in high concurrency scenarios?
  • jdk1.7. The capacity of hashMap is limited. With high parallel distribution and multi-element insertion, hashMap will reach a certain degree of saturation.
  • It will be resize d and expanded.
  • After capacity expansion, rehash will traverse the array data and refresh all data to the new array.
jdk1.8 optimization?
  • JDK 8 adopts the method of bit bucket + linked list / red black tree. When the length of the linked list of a bit bucket exceeds 8, the linked list will be converted into red black tree.
What is the difference between HashMap and HashTable?
  • hashMap is suitable for single thread, and the key/value is allowed to be empty
  • hashTable is suitable for multithreading, and key/value cannot be empty

Topics: Java source code