HashMap Implementation Details (Java 8)

Posted by TGLMan on Sun, 07 Nov 2021 18:08:40 +0100

1 Introduction

Differences between HashMap before and after Java 8:

Contrast Item	Before Java 8	After Java 8 (including)
Node type	Entry	Node/TreeNode
storage structure	Array + One-way Chain List	Array + One-way Chain List / Red-Black Tree
Insertion method	Head Interpolation	Tail interpolation
Expansion timing	Expand before inserting	Insert before expand
hash algorithm	4th Bit Operation+5th XOR	Bit operation 1 + XOR 1
Insertion method	Array + One-way Chain List	Array + One-way Chain List / Red-Black Tree

The following analyses are based on Java 8.

1.1 Main member variables of HashMap

HashMap maintains an array Node[] table,
The location where the elements are stored in this array is called bin.
You can think of bin as a container or a bucket.

Main member variables in HashMap:

// Node array, the core of HashMap, used to store key-value.
// The length of the array is always an integer power of 2
transient Node<K,V>[] table;

// Number of key-value s contained in the current map
transient int size;

// Record the number of map structure modifications
// Structural modification refers to modifying the number of k-v or changing the internal structure (e.g., rehash)
// This field is used to fail-fast if concurrency occurs while traversing a map with an iterator.
transient int modCount;

// Threshold for Node array (table) expansion, capacity * loadFactor
// Where capacity is the length of the table and loadFactor is the load factor
int threshold;

// Load factor (default 0.75)
final float loadFactor;


// Default initial value of table's capacity (table.length), MUST be a power of two.
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

// Maximum capacity of table, MUST be a power of two <= 1 < 30.
static final int MAXIMUM_CAPACITY = 1 << 30;

// Default value of load factor
static final float DEFAULT_LOAD_FACTOR = 0.75f;

// When the chain length is greater than or equal to 8, it is possible to convert the chain table to a red-black tree structure
static final int TREEIFY_THRESHOLD = 8;

// Restore a tree to a chain table when the number of nodes in a red-black tree is less than or equal to 6
static final int UNTREEIFY_THRESHOLD = 6;

// When bin (or Node) is treeify, the minimum capacity that the table needs to satisfy. 
// Not more than this minimum size, the table resize s instead of treeify.
// MIN_TREEIFY_CAPACITY should be at least 4 * TREEIFY_THRESHOLD to avoid conflict between resize and tree threshold.
// When table.length >= MIN_ TREEIFY_ Convert a chain table to a red-black tree only when CAPACITY and the chain length is greater than or equal to 8
static final int MIN_TREEIFY_CAPACITY = 64;

1.2 Internal classes of HashMap

HashMap's internal class Node:

    /**
     * Basic hash bin node, used for most entries.  (See below for
     * TreeNode subclass, and in LinkedHashMap for its Entry subclass.)
     */
    static class Node<K,V> implements Map.Entry<K,V> {
        final int hash;
        final K key;
        V value;
        Node<K,V> next;
        // Omit constructor, getter, setter, equals, hashCode, toString
}

It is easy to see that Node is a one-way chain table structure.

HashMap's internal class TreeNode:

    /**
     * Entry for Tree bins. Extends LinkedHashMap.Entry (which in turn
     * extends Node) so can be used as extension of either regular or
     * linked node.
     */
    static final class TreeNode<K,V> extends LinkedHashMap.Entry<K,V> {
        TreeNode<K,V> parent;  // red-black tree links
        TreeNode<K,V> left;
        TreeNode<K,V> right;
        TreeNode<K,V> prev;    // needed to unlink next upon deletion
        boolean red;
        // Omit all methods
}

Structure of LinkedHashMap.Entry:

    /**
     * HashMap.Node subclass for normal LinkedHashMap entries.
     */
    static class Entry<K,V> extends HashMap.Node<K,V> {
        Entry<K,V> before, after;
        Entry(int hash, K key, V value, Node<K,V> next) {
            super(hash, key, value, next);
        }
    }

You can see that LinkedHashMap.Entry inherits HashMap.Node. So TreeNode is also a subclass of Node.

2 hash algorithm

    /**
     * Computes key.hashCode() and spreads (XORs) higher bits of hash
     * to lower.  Because the table uses power-of-two masking, sets of
     * hashes that vary only in bits above the current mask will
     * always collide. (Among known examples are sets of Float keys
     * holding consecutive whole numbers in small tables.)  So we
     * apply a transform that spreads the impact of higher bits
     * downward. There is a tradeoff between speed, utility, and
     * quality of bit-spreading. Because many common sets of hashes
     * are already reasonably distributed (so don't benefit from
     * spreading), and because we use trees to handle large sets of
     * collisions in bins, we just XOR some shifted bits in the
     * cheapest possible way to reduce systematic lossage, as well as
     * to incorporate impact of the highest bits that would otherwise
     * never be used in index calculations because of table bounds.
     * 
     * ==Below is my bad translation. If you know perturbation function well, you will understand it. If you don't, see below==
     * Calculate key.hashCode() and XOR the high (16 bits high) and low (16 bits low) positions of hash (this result only affects the low 16 bits of key.hashCode().
     * Because table s are masked by the n-power of 2, only high-bit hash changes always collide under the mask.
     * So we made a transition to propagate the high impact down. 
     * This is a trade-off between the speed, utility and quality of in-place communication.
     * Because many of the common hash sets are well distributed (and therefore cannot benefit from propagation), 
     * And because we use trees to handle large-scale conflicts in bin, 
     * So we just reduce the consumption of the system at the least cost by Xor some shifts.
     * Combine the effects of the highest bits (16 bits high) together. Otherwise, the highest bits will never be used for index calculations because of the table's boundaries.
     */
    static final int hash(Object key) {
        int h;
        return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
    }

Find an index of an element in the array, n is the length of the table:
i = (n - 1) & hash

Because the length of the array in HashMap is an integer power of 2, the n-1 results are always high-bit all zeros and low-bit all 1 (the result is 000...0111...111 like this).

Off-topic topic: I don't know about bitwise operations, so I can refer to them This blog.

For example [1], suppose table.size = 16, has two elements A and B (corresponding hash is H 1 and H2, respectively), and collisions occur if Object.hashCode() is used directly:

H1: 00000000 00000000 00000000 00000101
H2: 00000000 11111111 00000000 00000101

// Hash collision example:
index1 =  H1 & (n - 1) = 00000000 00000000 00000000 00000101 & 1111 = 0101 = 5
index2 =  H2 & (n - 1) = 00000000 11111111 00000000 00000101 & 1111 = 0101 = 5

However, if you "disturb" the lower 16 bits with the higher 16 bits, there will be no collision:

00000000 00000000 00000000 00000101 // H1
00000000 00000000 00000000 00000000 // H1 >>> 16
00000000 00000000 00000000 00000101 // hash1 = H1 ^ (H1 >>> 16)

00000000 11111111 00000000 00000101 // H2
00000000 00000000 00000000 11111111 // H2 >>> 16
00000000 11111111 00000000 11111010 // hash2 = H2 ^ (H2 >>> 16)

// No Hash Collision 
index1 = hash1 & (n - 1) = 00000000 00000000 00000000 00000101 & 1111 = 0101 =  5
index2 = hash2 & (n - 1) = 00000000 11111111 00000000 11111010 & 1111 = 1010 = 10

Summary:
hash(key) is used to obtain the hash value of the key, where the lower 16 bits are "perturbed" to increase the balance of hash results.

2 put method insert element

    /**
     * Associates the specified value with the specified key in this map.
     * If the map previously contained a mapping for the key, the old
     * value is replaced.
     */
    public V put(K key, V value) {
        return putVal(hash(key), key, value, false, true);
    }

    /**
     * Implements Map.put and related methods.
     *
     * @param hash hash for key
     * @param key the key
     * @param value the value to put
     * @param onlyIfAbsent if true, don't change existing value
     * @param evict if false, the table is in creation mode.
     * @return previous value, or null if none
     */
    final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
                   
        Node<K,V>[] tab; Node<K,V> p; int n, i;
        
        // Assignment: tab = table, n = tab.length 
        // If the table is empty, it needs to be initialized
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length; // Note 1, Expansion
            
        // If there are no elements at the calculated location i
        if ((p = tab[i = (n - 1) & hash]) == null)
            // Encapsulate key-value as Node and place it on position i
            tab[i] = newNode(hash, key, value, null); 
        // Hash conflict occurred: Node p is already on the calculated location i
        else {
            Node<K,V> e; K k;
            // If P is the same as the currently inserted element (hash value is the same, key is the same). Find the ode where p is to be overwritten
            if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
                e = p; // Assignment: Assign Node p that already exists at position i to e
            // If p is a red-black tree
            else if (p instanceof TreeNode)
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value); // Note 2
            else {
                // Traversing through a single-chain table, since binCount starts at 0, the length of the single-chain table is binCount-1
                for (int binCount = 0; ; ++binCount) {
                    // If p is the last Node in the list of chains
                    if ((e = p.next) == null) {
                        // Encapsulate key-value as Node and append to the end of the list
                        p.next = newNode(hash, key, value, null);
                        // If the chain list length is greater than or equal to 8 after adding nodes
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab, hash); // Note 3, either resize or treeify
                        break;
                    }
                    // Finding e is the ode to be overwritten
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        break;
                    p = e;
                }
            }
            // If e is not null, there are nodes to cover
            if (e != null) { // existing mapping for key
                V oldValue = e.value;
                if (!onlyIfAbsent || oldValue == null)
                    e.value = value;
                // Empty callback function, overridden in LinkedHashMap
                afterNodeAccess(e);
                return oldValue;
            }
        }
        // Executed here, inserting a new node
        ++modCount;
        // Update the size of the map and determine if expansion is required
        if (++size > threshold)
            resize();
        // Empty callback function, overridden in LinkedHashMap
        afterNodeInsertion(evict);
        return null;
    }

    // Create a regular (non-tree) node
    // Encapsulate key-value as a normal ode
    Node<K,V> newNode(int hash, K key, V value, Node<K,V> next) {
        return new Node<>(hash, key, value, next);
    }

HashMap.put(key, value) inserts a simplified mappingX(key-value, assuming that the index computed by this key is i):

If table is empty, resize(), encapsulate mappingX as Node, insert it into table[i], modify Map.modCount and Map.size properties, and return null.
If table[i] is empty, encapsulate mappingX as Node, insert it into table[i], modify Map.modCount and Map.size properties and return null.
If there is already a Node P on table[i], P logically equals mappingX(hash value is the same, key is the same), encapsulate mappingX as Node, replace p, and return the old value p.value.
If there is already a TreeNode p on table[i], insert mappingX into the tree by putTreeVal. If there is a logically equal node p, replace p, and return the old value p.value; If it does not exist, return null.
Traverse the chain table on table[i]. If you find the logically equal node p, encapsulate mappingX as Node, replace p, and return the old value p.value; If it does not exist, encapsulate mappingX as Node, insert it at the end of the list, resize() if it needs to be expanded after insertion, and return null.

3 resize method expands table

    /**
     * Initializes or doubles table size.  If null, allocates in
     * accord with initial capacity target held in field threshold.
     * Otherwise, because we are using power-of-two expansion, the
     * elements from each bin must either stay at same index, or move
     * with a power of two offset in the new table.
     *
     * Initialize or double the table capacity.
     * If the table is empty, allocate the initial capacity. Otherwise, because it is extended by an integer power of 2 (large), 
     * Elements in each bin are either left on the original index or moved to the offset of the new extension.
     */
    final Node<K,V>[] resize() {
        // Assignment: oldTab as table before expansion 
        Node<K,V>[] oldTab = table;
        // Assignment: oldCap assignment is table.length before expansion
        int oldCap = (oldTab == null) ? 0 : oldTab.length;
        // Assignment: oldThr assignment is threshold before expansion
        int oldThr = threshold;
        int newCap, newThr = 0;
        // OldCap > 0, indicating that the table has been initialized
        if (oldCap > 0) {
            // If the current capacity has reached the maximum
            if (oldCap >= MAXIMUM_CAPACITY) {
                threshold = Integer.MAX_VALUE;
                // Return to the current table without expansion
                return oldTab;
            }
            // Assignment: newCap is assigned oldCap*2, which is twice the current capacity
            // If newCap <maximum capacity limit and oldCap >=initial capacity 16
            else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                     oldCap >= DEFAULT_INITIAL_CAPACITY)
                // Assignment: newThr assigns oldThr*2, which is twice the current threshold
                newThr = oldThr << 1; // double threshold
        }
        //If the current table is empty, but there is a threshold value. Represents that capacity was specified at the time of initialization, the threshold value
        else if (oldThr > 0) // initial capacity was placed in threshold
            // Assignment: The capacity of the new table is assigned to the old threshold
            newCap = oldThr;
        // The current table is empty and has no threshold.  
        else {               // zero initial threshold signifies using defaults
            // Assignment: The table capacity assignment is the default of 16
            newCap = DEFAULT_INITIAL_CAPACITY;
            // Assignment: New threshold assignment is default load factor 0.75f * default capacity 16 = 12
            newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
        }
        // If the new threshold is 0, it means that the current table is empty, but there are thresholds
        if (newThr == 0) {
            // New thresholds based on new table capacity and load factor (expanded thresholds)
            float ft = (float)newCap * loadFactor;
            // Cross-border Repair
            newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                      (int)ft : Integer.MAX_VALUE);
        }
        // Update threshold
        threshold = newThr;
        //Build a new Node array newTab based on the new capacity
        @SuppressWarnings({"rawtypes","unchecked"})
        Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
        // Update table references
        table = newTab;
        // If oldTab has elements, you need to move the elements from oldTab to newTab
        if (oldTab != null) {
            // Traverse oldCap
            for (int j = 0; j < oldCap; ++j) {
                Node<K,V> e;
                // Assignment: eAssignment is the oldTab[j] currently traversed
                if ((e = oldTab[j]) != null) {
                    // Leave the oldTab[j] bin empty for GC convenience
                    oldTab[j] = null;
                    // If there is only one element in the current list (no hash collision)
                    if (e.next == null)
                        // Place elements in newTab
                        newTab[e.hash & (newCap - 1)] = e;
                    // If a hash collision occurs and Node has been converted to TreeNode
                    else if (e instanceof TreeNode)
                        // Leave it alone for now
                        ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                    // If a hash collision occurs and the number of nodes is less than eight (bin is a chain table structure)
                    else { // preserve order
                        // Because expansion is twice capacity, 
                        // So every node in the original list, 
                        // Perhaps in the original subscript, the low bit; 
                        // It may also be an expanded subscript, the high bit.
                        // high bit = low bit + oldTab.length
                        Node<K,V> loHead = null, loTail = null;
                        Node<K,V> hiHead = null, hiTail = null;
                        Node<K,V> next;
                        do {
                            next = e.next;
                            // Equal to 0 means: the subscript after rehash is less than oldCap and should be stored at a low position
                            // Otherwise it should be stored in a high position
                            if ((e.hash & oldCap) == 0) {
                                if (loTail == null)
                                    loHead = e;
                                else
                                    loTail.next = e;
                                loTail = e;
                            }
                            else {
                                if (hiTail == null)
                                    hiHead = e;
                                else
                                    hiTail.next = e;
                                hiTail = e;
                            }
                        // Loop until end of list
                        } while ((e = next) != null);

                        // Store the low-order list in the original index
                        if (loTail != null) {
                            loTail.next = null;
                            newTab[j] = loHead;
                        }
                        // Store the high-order list at the new index
                        if (hiTail != null) {
                            hiTail.next = null;
                            newTab[j + oldCap] = hiHead;
                        }
                    }
                }
            }
        }
        return newTab;
    }

Be careful:
When resize() expands a table, it expands the table.length to twice its original size, which is reflected by moving it one bit to the left in binary.
For example: table.length=16, expands to 32.
The binary representation is:

Before expansion(16): 0000 1000
 After expansion(32): 0001 0000

4 treeifyBin method: single-chain table to red-black tree

    /**
     * Replaces all linked nodes in bin at index for given hash unless
     * table is too small, in which case resizes instead.
     * Replace all nodes of a single-chain list in bin at a given index (with a red-black tree).
     * If the table is small, expand it without replacing it.
     */
    final void treeifyBin(Node<K,V>[] tab, int hash) {
        int n, index; Node<K,V> e;
        // tab is small and expands only
        if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY)
            resize();
        // Assignment: The index assignment is the index calculated from the current hash value
        // Assignment: e assigns tab[index]
        // Judgment: e!= Null, that is, the bin position corresponding to the current hash value is not empty
        else if ((e = tab[index = (n - 1) & hash]) != null) {
            // hd stores the head of the double-chain table constructed below do-while, TL stores the tail of the double-chain table
            TreeNode<K,V> hd = null, tl = null;
            // Loop through a single-linked list, encapsulate Node as TreeNode, and construct a double-linked list structure in preparation for a transition to a red-black tree
            do {
                // Encapsulate Node as TreeNode
                TreeNode<K,V> p = replacementTreeNode(e, null);
                if (tl == null)
                    hd = p;
                else {
                    p.prev = tl;
                    tl.next = p;
                }
                tl = p;
            } while ((e = e.next) != null);
            // Assignment: tab[index] = hd
            // hd!= Null, indicating the need to convert to a red-black tree, where hd is the head of a two-way chain table made up of TreeNode
            if ((tab[index] = hd) != null)
                // Two-way Chain List to Red-Black Tree
                hd.treeify(tab);
        }
    }

    // For treeifyBin
    TreeNode<K,V> replacementTreeNode(Node<K,V> p, Node<K,V> next) {
        return new TreeNode<>(p.hash, p.key, p.value, next);
    }

The treeify method is an instance member method of TreeNode:

       /**
         * Forms tree of the nodes linked from this node.
         */
        final void treeify(Node<K,V>[] tab) {
            TreeNode<K,V> root = null;
            // This is a TreeNode instance object that calls this method
            // Traversing the double-chain table mechanism pointed to by this to construct a red-black tree
            for (TreeNode<K,V> x = this, next; x != null; x = next) {
                next = (TreeNode<K,V>)x.next;
                x.left = x.right = null;
                // Root node is empty (root has not been constructed yet), then root is constructed
                if (root == null) {
                    x.parent = null;
                    // Root node is black
                    x.red = false; 
                    root = x;
                }
                // The root node has been constructed, and the other descendant nodes are constructed below
                else {
                    K k = x.key;
                    int h = x.hash;
                    Class<?> kc = null;
                    // Dead cycle until x is added to the red-black tree structure and exits
                    for (TreeNode<K,V> p = root;;) {
                        int dir, ph;
                        K pk = p.key;
                        // If the hash value of the x node is less than the hash value of the p node
                        if ((ph = p.hash) > h)
                            // Assigning dir to -1 means looking to the left of p
                            dir = -1;
                        // Hash value of x node is greater than hash value of p node
                        else if (ph < h)
                            // Assigning dir to 1 means looking to the right of p
                            dir = 1;
                        // If the hash value of x is equal to the hash value of p, the key value is compared and the details are omitted.
                        else if ((kc == null &&
                                  (kc = comparableClassFor(k)) == null) ||
                                 (dir = compareComparables(kc, k, pk)) == 0)
                            dir = tieBreakOrder(k, pk);

                        TreeNode<K,V> xp = p;
                        // Assignment: P assigns p.left or p.right.
                        // If p == null, how do you find a place to put x
                        if ((p = (dir <= 0) ? p.left : p.right) == null) {
                            x.parent = xp;
                            if (dir <= 0)
                                xp.left = x;
                            else
                                xp.right = x;
                            // x After inserting the structure of the red-black tree, adjust to make the red-black tree continue to meet the definition of the red-black tree
                            root = balanceInsertion(root, x);
                            break;
                        }
                    }
                }
            }
            // Ensures that the given root is the first node of its bin
            moveRootToFront(tab, root);
        }

Quote

[1]. Detailed Hash algorithm in HashMap (perturbation function)
[2]. (9) Concurrent containers for in-depth concurrent programming: blocking queues, replicating containers while writing, and locking segmented containers
[3]. Interview Requirements: HashMap Source Parsing (JDK8)
[4]. Deep Understanding of HashMap Principle (1) - HashMap Source Parsing (JDK 1.8)
[5]. java.util.HashMap(Java 8)

Topics: Java linked list

Programmer Think