HashMap Implementation Details (Java 8)

Posted by TGLMan on Sun, 07 Nov 2021 18:08:40 +0100

1 Introduction

Differences between HashMap before and after Java 8:

Contrast ItemBefore Java 8After Java 8 (including)
Node typeEntryNode/TreeNode
storage structureArray + One-way Chain ListArray + One-way Chain List / Red-Black Tree
Insertion methodHead InterpolationTail interpolation
Expansion timingExpand before insertingInsert before expand
hash algorithm4th Bit Operation+5th XORBit operation 1 + XOR 1
Insertion methodArray + One-way Chain ListArray + One-way Chain List / Red-Black Tree

The following analyses are based on Java 8.

1.1 Main member variables of HashMap

HashMap maintains an array Node[] table,
The location where the elements are stored in this array is called bin.
You can think of bin as a container or a bucket.

Main member variables in HashMap:

// Node array, the core of HashMap, used to store key-value.
// The length of the array is always an integer power of 2
transient Node<K,V>[] table;

// Number of key-value s contained in the current map
transient int size;

// Record the number of map structure modifications
// Structural modification refers to modifying the number of k-v or changing the internal structure (e.g., rehash)
// This field is used to fail-fast if concurrency occurs while traversing a map with an iterator.
transient int modCount;

// Threshold for Node array (table) expansion, capacity * loadFactor
// Where capacity is the length of the table and loadFactor is the load factor
int threshold;

// Load factor (default 0.75)
final float loadFactor;

// Default initial value of table's capacity (table.length), MUST be a power of two.
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

// Maximum capacity of table, MUST be a power of two <= 1 < 30.
static final int MAXIMUM_CAPACITY = 1 << 30;

// Default value of load factor
static final float DEFAULT_LOAD_FACTOR = 0.75f;

// When the chain length is greater than or equal to 8, it is possible to convert the chain table to a red-black tree structure
static final int TREEIFY_THRESHOLD = 8;

// Restore a tree to a chain table when the number of nodes in a red-black tree is less than or equal to 6
static final int UNTREEIFY_THRESHOLD = 6;

// When bin (or Node) is treeify, the minimum capacity that the table needs to satisfy. 
// Not more than this minimum size, the table resize s instead of treeify.
// MIN_TREEIFY_CAPACITY should be at least 4 * TREEIFY_THRESHOLD to avoid conflict between resize and tree threshold.
// When table.length >= MIN_ TREEIFY_ Convert a chain table to a red-black tree only when CAPACITY and the chain length is greater than or equal to 8
static final int MIN_TREEIFY_CAPACITY = 64;

1.2 Internal classes of HashMap

HashMap's internal class Node:

     * Basic hash bin node, used for most entries.  (See below for
     * TreeNode subclass, and in LinkedHashMap for its Entry subclass.)
    static class Node<K,V> implements Map.Entry<K,V> {
        final int hash;
        final K key;
        V value;
        Node<K,V> next;
        // Omit constructor, getter, setter, equals, hashCode, toString

It is easy to see that Node is a one-way chain table structure.

HashMap's internal class TreeNode:

     * Entry for Tree bins. Extends LinkedHashMap.Entry (which in turn
     * extends Node) so can be used as extension of either regular or
     * linked node.
    static final class TreeNode<K,V> extends LinkedHashMap.Entry<K,V> {
        TreeNode<K,V> parent;  // red-black tree links
        TreeNode<K,V> left;
        TreeNode<K,V> right;
        TreeNode<K,V> prev;    // needed to unlink next upon deletion
        boolean red;
        // Omit all methods

Structure of LinkedHashMap.Entry:

     * HashMap.Node subclass for normal LinkedHashMap entries.
    static class Entry<K,V> extends HashMap.Node<K,V> {
        Entry<K,V> before, after;
        Entry(int hash, K key, V value, Node<K,V> next) {
            super(hash, key, value, next);

You can see that LinkedHashMap.Entry inherits HashMap.Node. So TreeNode is also a subclass of Node.

2 hash algorithm

     * Computes key.hashCode() and spreads (XORs) higher bits of hash
     * to lower.  Because the table uses power-of-two masking, sets of
     * hashes that vary only in bits above the current mask will
     * always collide. (Among known examples are sets of Float keys
     * holding consecutive whole numbers in small tables.)  So we
     * apply a transform that spreads the impact of higher bits
     * downward. There is a tradeoff between speed, utility, and
     * quality of bit-spreading. Because many common sets of hashes
     * are already reasonably distributed (so don't benefit from
     * spreading), and because we use trees to handle large sets of
     * collisions in bins, we just XOR some shifted bits in the
     * cheapest possible way to reduce systematic lossage, as well as
     * to incorporate impact of the highest bits that would otherwise
     * never be used in index calculations because of table bounds.
     * ==Below is my bad translation. If you know perturbation function well, you will understand it. If you don't, see below==
     * Calculate key.hashCode() and XOR the high (16 bits high) and low (16 bits low) positions of hash (this result only affects the low 16 bits of key.hashCode().
     * Because table s are masked by the n-power of 2, only high-bit hash changes always collide under the mask.
     * So we made a transition to propagate the high impact down. 
     * This is a trade-off between the speed, utility and quality of in-place communication.
     * Because many of the common hash sets are well distributed (and therefore cannot benefit from propagation), 
     * And because we use trees to handle large-scale conflicts in bin, 
     * So we just reduce the consumption of the system at the least cost by Xor some shifts.
     * Combine the effects of the highest bits (16 bits high) together. Otherwise, the highest bits will never be used for index calculations because of the table's boundaries.
    static final int hash(Object key) {
        int h;
        return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);

Find an index of an element in the array, n is the length of the table:
i = (n - 1) & hash

Because the length of the array in HashMap is an integer power of 2, the n-1 results are always high-bit all zeros and low-bit all 1 (the result is 000...0111...111 like this).

Off-topic topic: I don't know about bitwise operations, so I can refer to them This blog.

For example [1], suppose table.size = 16, has two elements A and B (corresponding hash is H 1 and H2, respectively), and collisions occur if Object.hashCode() is used directly:

H1: 00000000 00000000 00000000 00000101
H2: 00000000 11111111 00000000 00000101

// Hash collision example:
index1 =  H1 & (n - 1) = 00000000 00000000 00000000 00000101 & 1111 = 0101 = 5
index2 =  H2 & (n - 1) = 00000000 11111111 00000000 00000101 & 1111 = 0101 = 5

However, if you "disturb" the lower 16 bits with the higher 16 bits, there will be no collision:

00000000 00000000 00000000 00000101 // H1
00000000 00000000 00000000 00000000 // H1 >>> 16
00000000 00000000 00000000 00000101 // hash1 = H1 ^ (H1 >>> 16)

00000000 11111111 00000000 00000101 // H2
00000000 00000000 00000000 11111111 // H2 >>> 16
00000000 11111111 00000000 11111010 // hash2 = H2 ^ (H2 >>> 16)

// No Hash Collision 
index1 = hash1 & (n - 1) = 00000000 00000000 00000000 00000101 & 1111 = 0101 =  5
index2 = hash2 & (n - 1) = 00000000 11111111 00000000 11111010 & 1111 = 1010 = 10

hash(key) is used to obtain the hash value of the key, where the lower 16 bits are "perturbed" to increase the balance of hash results.

2 put method insert element

     * Associates the specified value with the specified key in this map.
     * If the map previously contained a mapping for the key, the old
     * value is replaced.
    public V put(K key, V value) {
        return putVal(hash(key), key, value, false, true);

     * Implements Map.put and related methods.
     * @param hash hash for key
     * @param key the key
     * @param value the value to put
     * @param onlyIfAbsent if true, don't change existing value
     * @param evict if false, the table is in creation mode.
     * @return previous value, or null if none
    final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        Node<K,V>[] tab; Node<K,V> p; int n, i;
        // Assignment: tab = table, n = tab.length 
        // If the table is empty, it needs to be initialized
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length; // Note 1, Expansion
        // If there are no elements at the calculated location i
        if ((p = tab[i = (n - 1) & hash]) == null)
            // Encapsulate key-value as Node and place it on position i
            tab[i] = newNode(hash, key, value, null); 
        // Hash conflict occurred: Node p is already on the calculated location i
        else {
            Node<K,V> e; K k;
            // If P is the same as the currently inserted element (hash value is the same, key is the same). Find the ode where p is to be overwritten
            if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
                e = p; // Assignment: Assign Node p that already exists at position i to e
            // If p is a red-black tree
            else if (p instanceof TreeNode)
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value); // Note 2
            else {
                // Traversing through a single-chain table, since binCount starts at 0, the length of the single-chain table is binCount-1
                for (int binCount = 0; ; ++binCount) {
                    // If p is the last Node in the list of chains
                    if ((e = p.next) == null) {
                        // Encapsulate key-value as Node and append to the end of the list
                        p.next = newNode(hash, key, value, null);
                        // If the chain list length is greater than or equal to 8 after adding nodes
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab, hash); // Note 3, either resize or treeify
                    // Finding e is the ode to be overwritten
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                    p = e;
            // If e is not null, there are nodes to cover
            if (e != null) { // existing mapping for key
                V oldValue = e.value;
                if (!onlyIfAbsent || oldValue == null)
                    e.value = value;
                // Empty callback function, overridden in LinkedHashMap
                return oldValue;
        // Executed here, inserting a new node
        // Update the size of the map and determine if expansion is required
        if (++size > threshold)
        // Empty callback function, overridden in LinkedHashMap
        return null;

    // Create a regular (non-tree) node
    // Encapsulate key-value as a normal ode
    Node<K,V> newNode(int hash, K key, V value, Node<K,V> next) {
        return new Node<>(hash, key, value, next);

HashMap.put(key, value) inserts a simplified mappingX(key-value, assuming that the index computed by this key is i):

  1. If table is empty, resize(), encapsulate mappingX as Node, insert it into table[i], modify Map.modCount and Map.size properties, and return null.
  2. If table[i] is empty, encapsulate mappingX as Node, insert it into table[i], modify Map.modCount and Map.size properties and return null.
  3. If there is already a Node P on table[i], P logically equals mappingX(hash value is the same, key is the same), encapsulate mappingX as Node, replace p, and return the old value p.value.
  4. If there is already a TreeNode p on table[i], insert mappingX into the tree by putTreeVal. If there is a logically equal node p, replace p, and return the old value p.value; If it does not exist, return null.
  5. Traverse the chain table on table[i]. If you find the logically equal node p, encapsulate mappingX as Node, replace p, and return the old value p.value; If it does not exist, encapsulate mappingX as Node, insert it at the end of the list, resize() if it needs to be expanded after insertion, and return null.

3 resize method expands table

     * Initializes or doubles table size.  If null, allocates in
     * accord with initial capacity target held in field threshold.
     * Otherwise, because we are using power-of-two expansion, the
     * elements from each bin must either stay at same index, or move
     * with a power of two offset in the new table.
     * Initialize or double the table capacity.
     * If the table is empty, allocate the initial capacity. Otherwise, because it is extended by an integer power of 2 (large), 
     * Elements in each bin are either left on the original index or moved to the offset of the new extension.
    final Node<K,V>[] resize() {
        // Assignment: oldTab as table before expansion 
        Node<K,V>[] oldTab = table;
        // Assignment: oldCap assignment is table.length before expansion
        int oldCap = (oldTab == null) ? 0 : oldTab.length;
        // Assignment: oldThr assignment is threshold before expansion
        int oldThr = threshold;
        int newCap, newThr = 0;
        // OldCap > 0, indicating that the table has been initialized
        if (oldCap > 0) {
            // If the current capacity has reached the maximum
            if (oldCap >= MAXIMUM_CAPACITY) {
                threshold = Integer.MAX_VALUE;
                // Return to the current table without expansion
                return oldTab;
            // Assignment: newCap is assigned oldCap*2, which is twice the current capacity
            // If newCap <maximum capacity limit and oldCap >=initial capacity 16
            else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                     oldCap >= DEFAULT_INITIAL_CAPACITY)
                // Assignment: newThr assigns oldThr*2, which is twice the current threshold
                newThr = oldThr << 1; // double threshold
        //If the current table is empty, but there is a threshold value. Represents that capacity was specified at the time of initialization, the threshold value
        else if (oldThr > 0) // initial capacity was placed in threshold
            // Assignment: The capacity of the new table is assigned to the old threshold
            newCap = oldThr;
        // The current table is empty and has no threshold.  
        else {               // zero initial threshold signifies using defaults
            // Assignment: The table capacity assignment is the default of 16
            newCap = DEFAULT_INITIAL_CAPACITY;
            // Assignment: New threshold assignment is default load factor 0.75f * default capacity 16 = 12
        // If the new threshold is 0, it means that the current table is empty, but there are thresholds
        if (newThr == 0) {
            // New thresholds based on new table capacity and load factor (expanded thresholds)
            float ft = (float)newCap * loadFactor;
            // Cross-border Repair
            newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                      (int)ft : Integer.MAX_VALUE);
        // Update threshold
        threshold = newThr;
        //Build a new Node array newTab based on the new capacity
        Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
        // Update table references
        table = newTab;
        // If oldTab has elements, you need to move the elements from oldTab to newTab
        if (oldTab != null) {
            // Traverse oldCap
            for (int j = 0; j < oldCap; ++j) {
                Node<K,V> e;
                // Assignment: eAssignment is the oldTab[j] currently traversed
                if ((e = oldTab[j]) != null) {
                    // Leave the oldTab[j] bin empty for GC convenience
                    oldTab[j] = null;
                    // If there is only one element in the current list (no hash collision)
                    if (e.next == null)
                        // Place elements in newTab
                        newTab[e.hash & (newCap - 1)] = e;
                    // If a hash collision occurs and Node has been converted to TreeNode
                    else if (e instanceof TreeNode)
                        // Leave it alone for now
                        ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                    // If a hash collision occurs and the number of nodes is less than eight (bin is a chain table structure)
                    else { // preserve order
                        // Because expansion is twice capacity, 
                        // So every node in the original list, 
                        // Perhaps in the original subscript, the low bit; 
                        // It may also be an expanded subscript, the high bit.
                        // high bit = low bit + oldTab.length
                        Node<K,V> loHead = null, loTail = null;
                        Node<K,V> hiHead = null, hiTail = null;
                        Node<K,V> next;
                        do {
                            next = e.next;
                            // Equal to 0 means: the subscript after rehash is less than oldCap and should be stored at a low position
                            // Otherwise it should be stored in a high position
                            if ((e.hash & oldCap) == 0) {
                                if (loTail == null)
                                    loHead = e;
                                    loTail.next = e;
                                loTail = e;
                            else {
                                if (hiTail == null)
                                    hiHead = e;
                                    hiTail.next = e;
                                hiTail = e;
                        // Loop until end of list
                        } while ((e = next) != null);

                        // Store the low-order list in the original index
                        if (loTail != null) {
                            loTail.next = null;
                            newTab[j] = loHead;
                        // Store the high-order list at the new index
                        if (hiTail != null) {
                            hiTail.next = null;
                            newTab[j + oldCap] = hiHead;
        return newTab;

Be careful:
When resize() expands a table, it expands the table.length to twice its original size, which is reflected by moving it one bit to the left in binary.
For example: table.length=16, expands to 32.
The binary representation is:

Before expansion(16): 0000 1000
 After expansion(32): 0001 0000

4 treeifyBin method: single-chain table to red-black tree

     * Replaces all linked nodes in bin at index for given hash unless
     * table is too small, in which case resizes instead.
     * Replace all nodes of a single-chain list in bin at a given index (with a red-black tree).
     * If the table is small, expand it without replacing it.
    final void treeifyBin(Node<K,V>[] tab, int hash) {
        int n, index; Node<K,V> e;
        // tab is small and expands only
        if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY)
        // Assignment: The index assignment is the index calculated from the current hash value
        // Assignment: e assigns tab[index]
        // Judgment: e!= Null, that is, the bin position corresponding to the current hash value is not empty
        else if ((e = tab[index = (n - 1) & hash]) != null) {
            // hd stores the head of the double-chain table constructed below do-while, TL stores the tail of the double-chain table
            TreeNode<K,V> hd = null, tl = null;
            // Loop through a single-linked list, encapsulate Node as TreeNode, and construct a double-linked list structure in preparation for a transition to a red-black tree
            do {
                // Encapsulate Node as TreeNode
                TreeNode<K,V> p = replacementTreeNode(e, null);
                if (tl == null)
                    hd = p;
                else {
                    p.prev = tl;
                    tl.next = p;
                tl = p;
            } while ((e = e.next) != null);
            // Assignment: tab[index] = hd
            // hd!= Null, indicating the need to convert to a red-black tree, where hd is the head of a two-way chain table made up of TreeNode
            if ((tab[index] = hd) != null)
                // Two-way Chain List to Red-Black Tree

    // For treeifyBin
    TreeNode<K,V> replacementTreeNode(Node<K,V> p, Node<K,V> next) {
        return new TreeNode<>(p.hash, p.key, p.value, next);

The treeify method is an instance member method of TreeNode:

         * Forms tree of the nodes linked from this node.
        final void treeify(Node<K,V>[] tab) {
            TreeNode<K,V> root = null;
            // This is a TreeNode instance object that calls this method
            // Traversing the double-chain table mechanism pointed to by this to construct a red-black tree
            for (TreeNode<K,V> x = this, next; x != null; x = next) {
                next = (TreeNode<K,V>)x.next;
                x.left = x.right = null;
                // Root node is empty (root has not been constructed yet), then root is constructed
                if (root == null) {
                    x.parent = null;
                    // Root node is black
                    x.red = false; 
                    root = x;
                // The root node has been constructed, and the other descendant nodes are constructed below
                else {
                    K k = x.key;
                    int h = x.hash;
                    Class<?> kc = null;
                    // Dead cycle until x is added to the red-black tree structure and exits
                    for (TreeNode<K,V> p = root;;) {
                        int dir, ph;
                        K pk = p.key;
                        // If the hash value of the x node is less than the hash value of the p node
                        if ((ph = p.hash) > h)
                            // Assigning dir to -1 means looking to the left of p
                            dir = -1;
                        // Hash value of x node is greater than hash value of p node
                        else if (ph < h)
                            // Assigning dir to 1 means looking to the right of p
                            dir = 1;
                        // If the hash value of x is equal to the hash value of p, the key value is compared and the details are omitted.
                        else if ((kc == null &&
                                  (kc = comparableClassFor(k)) == null) ||
                                 (dir = compareComparables(kc, k, pk)) == 0)
                            dir = tieBreakOrder(k, pk);

                        TreeNode<K,V> xp = p;
                        // Assignment: P assigns p.left or p.right.
                        // If p == null, how do you find a place to put x
                        if ((p = (dir <= 0) ? p.left : p.right) == null) {
                            x.parent = xp;
                            if (dir <= 0)
                                xp.left = x;
                                xp.right = x;
                            // x After inserting the structure of the red-black tree, adjust to make the red-black tree continue to meet the definition of the red-black tree
                            root = balanceInsertion(root, x);
            // Ensures that the given root is the first node of its bin
            moveRootToFront(tab, root);


[1]. Detailed Hash algorithm in HashMap (perturbation function)
[2]. (9) Concurrent containers for in-depth concurrent programming: blocking queues, replicating containers while writing, and locking segmented containers
[3]. Interview Requirements: HashMap Source Parsing (JDK8)
[4]. Deep Understanding of HashMap Principle (1) - HashMap Source Parsing (JDK 1.8)
[5]. java.util.HashMap(Java 8)

Topics: Java linked list