1 Introduction
Differences between HashMap before and after Java 8:
Contrast Item | Before Java 8 | After Java 8 (including) |
---|---|---|
Node type | Entry | Node/TreeNode |
storage structure | Array + One-way Chain List | Array + One-way Chain List / Red-Black Tree |
Insertion method | Head Interpolation | Tail interpolation |
Expansion timing | Expand before inserting | Insert before expand |
hash algorithm | 4th Bit Operation+5th XOR | Bit operation 1 + XOR 1 |
Insertion method | Array + One-way Chain List | Array + One-way Chain List / Red-Black Tree |
The following analyses are based on Java 8.
1.1 Main member variables of HashMap
HashMap maintains an array Node[] table,
The location where the elements are stored in this array is called bin.
You can think of bin as a container or a bucket.
Main member variables in HashMap:
// Node array, the core of HashMap, used to store key-value. // The length of the array is always an integer power of 2 transient Node<K,V>[] table; // Number of key-value s contained in the current map transient int size; // Record the number of map structure modifications // Structural modification refers to modifying the number of k-v or changing the internal structure (e.g., rehash) // This field is used to fail-fast if concurrency occurs while traversing a map with an iterator. transient int modCount; // Threshold for Node array (table) expansion, capacity * loadFactor // Where capacity is the length of the table and loadFactor is the load factor int threshold; // Load factor (default 0.75) final float loadFactor; // Default initial value of table's capacity (table.length), MUST be a power of two. static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16 // Maximum capacity of table, MUST be a power of two <= 1 < 30. static final int MAXIMUM_CAPACITY = 1 << 30; // Default value of load factor static final float DEFAULT_LOAD_FACTOR = 0.75f; // When the chain length is greater than or equal to 8, it is possible to convert the chain table to a red-black tree structure static final int TREEIFY_THRESHOLD = 8; // Restore a tree to a chain table when the number of nodes in a red-black tree is less than or equal to 6 static final int UNTREEIFY_THRESHOLD = 6; // When bin (or Node) is treeify, the minimum capacity that the table needs to satisfy. // Not more than this minimum size, the table resize s instead of treeify. // MIN_TREEIFY_CAPACITY should be at least 4 * TREEIFY_THRESHOLD to avoid conflict between resize and tree threshold. // When table.length >= MIN_ TREEIFY_ Convert a chain table to a red-black tree only when CAPACITY and the chain length is greater than or equal to 8 static final int MIN_TREEIFY_CAPACITY = 64;
1.2 Internal classes of HashMap
HashMap's internal class Node:
/** * Basic hash bin node, used for most entries. (See below for * TreeNode subclass, and in LinkedHashMap for its Entry subclass.) */ static class Node<K,V> implements Map.Entry<K,V> { final int hash; final K key; V value; Node<K,V> next; // Omit constructor, getter, setter, equals, hashCode, toString }
It is easy to see that Node is a one-way chain table structure.
HashMap's internal class TreeNode:
/** * Entry for Tree bins. Extends LinkedHashMap.Entry (which in turn * extends Node) so can be used as extension of either regular or * linked node. */ static final class TreeNode<K,V> extends LinkedHashMap.Entry<K,V> { TreeNode<K,V> parent; // red-black tree links TreeNode<K,V> left; TreeNode<K,V> right; TreeNode<K,V> prev; // needed to unlink next upon deletion boolean red; // Omit all methods }
Structure of LinkedHashMap.Entry:
/** * HashMap.Node subclass for normal LinkedHashMap entries. */ static class Entry<K,V> extends HashMap.Node<K,V> { Entry<K,V> before, after; Entry(int hash, K key, V value, Node<K,V> next) { super(hash, key, value, next); } }
You can see that LinkedHashMap.Entry inherits HashMap.Node. So TreeNode is also a subclass of Node.
2 hash algorithm
/** * Computes key.hashCode() and spreads (XORs) higher bits of hash * to lower. Because the table uses power-of-two masking, sets of * hashes that vary only in bits above the current mask will * always collide. (Among known examples are sets of Float keys * holding consecutive whole numbers in small tables.) So we * apply a transform that spreads the impact of higher bits * downward. There is a tradeoff between speed, utility, and * quality of bit-spreading. Because many common sets of hashes * are already reasonably distributed (so don't benefit from * spreading), and because we use trees to handle large sets of * collisions in bins, we just XOR some shifted bits in the * cheapest possible way to reduce systematic lossage, as well as * to incorporate impact of the highest bits that would otherwise * never be used in index calculations because of table bounds. * * ==Below is my bad translation. If you know perturbation function well, you will understand it. If you don't, see below== * Calculate key.hashCode() and XOR the high (16 bits high) and low (16 bits low) positions of hash (this result only affects the low 16 bits of key.hashCode(). * Because table s are masked by the n-power of 2, only high-bit hash changes always collide under the mask. * So we made a transition to propagate the high impact down. * This is a trade-off between the speed, utility and quality of in-place communication. * Because many of the common hash sets are well distributed (and therefore cannot benefit from propagation), * And because we use trees to handle large-scale conflicts in bin, * So we just reduce the consumption of the system at the least cost by Xor some shifts. * Combine the effects of the highest bits (16 bits high) together. Otherwise, the highest bits will never be used for index calculations because of the table's boundaries. */ static final int hash(Object key) { int h; return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16); }
Find an index of an element in the array, n is the length of the table:
i = (n - 1) & hash
Because the length of the array in HashMap is an integer power of 2, the n-1 results are always high-bit all zeros and low-bit all 1 (the result is 000...0111...111 like this).
Off-topic topic: I don't know about bitwise operations, so I can refer to them This blog.
For example [1], suppose table.size = 16, has two elements A and B (corresponding hash is H 1 and H2, respectively), and collisions occur if Object.hashCode() is used directly:
H1: 00000000 00000000 00000000 00000101 H2: 00000000 11111111 00000000 00000101 // Hash collision example: index1 = H1 & (n - 1) = 00000000 00000000 00000000 00000101 & 1111 = 0101 = 5 index2 = H2 & (n - 1) = 00000000 11111111 00000000 00000101 & 1111 = 0101 = 5
However, if you "disturb" the lower 16 bits with the higher 16 bits, there will be no collision:
00000000 00000000 00000000 00000101 // H1 00000000 00000000 00000000 00000000 // H1 >>> 16 00000000 00000000 00000000 00000101 // hash1 = H1 ^ (H1 >>> 16) 00000000 11111111 00000000 00000101 // H2 00000000 00000000 00000000 11111111 // H2 >>> 16 00000000 11111111 00000000 11111010 // hash2 = H2 ^ (H2 >>> 16) // No Hash Collision index1 = hash1 & (n - 1) = 00000000 00000000 00000000 00000101 & 1111 = 0101 = 5 index2 = hash2 & (n - 1) = 00000000 11111111 00000000 11111010 & 1111 = 1010 = 10
Summary:
hash(key) is used to obtain the hash value of the key, where the lower 16 bits are "perturbed" to increase the balance of hash results.
2 put method insert element
/** * Associates the specified value with the specified key in this map. * If the map previously contained a mapping for the key, the old * value is replaced. */ public V put(K key, V value) { return putVal(hash(key), key, value, false, true); } /** * Implements Map.put and related methods. * * @param hash hash for key * @param key the key * @param value the value to put * @param onlyIfAbsent if true, don't change existing value * @param evict if false, the table is in creation mode. * @return previous value, or null if none */ final V putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict) { Node<K,V>[] tab; Node<K,V> p; int n, i; // Assignment: tab = table, n = tab.length // If the table is empty, it needs to be initialized if ((tab = table) == null || (n = tab.length) == 0) n = (tab = resize()).length; // Note 1, Expansion // If there are no elements at the calculated location i if ((p = tab[i = (n - 1) & hash]) == null) // Encapsulate key-value as Node and place it on position i tab[i] = newNode(hash, key, value, null); // Hash conflict occurred: Node p is already on the calculated location i else { Node<K,V> e; K k; // If P is the same as the currently inserted element (hash value is the same, key is the same). Find the ode where p is to be overwritten if (p.hash == hash && ((k = p.key) == key || (key != null && key.equals(k)))) e = p; // Assignment: Assign Node p that already exists at position i to e // If p is a red-black tree else if (p instanceof TreeNode) e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value); // Note 2 else { // Traversing through a single-chain table, since binCount starts at 0, the length of the single-chain table is binCount-1 for (int binCount = 0; ; ++binCount) { // If p is the last Node in the list of chains if ((e = p.next) == null) { // Encapsulate key-value as Node and append to the end of the list p.next = newNode(hash, key, value, null); // If the chain list length is greater than or equal to 8 after adding nodes if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st treeifyBin(tab, hash); // Note 3, either resize or treeify break; } // Finding e is the ode to be overwritten if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k)))) break; p = e; } } // If e is not null, there are nodes to cover if (e != null) { // existing mapping for key V oldValue = e.value; if (!onlyIfAbsent || oldValue == null) e.value = value; // Empty callback function, overridden in LinkedHashMap afterNodeAccess(e); return oldValue; } } // Executed here, inserting a new node ++modCount; // Update the size of the map and determine if expansion is required if (++size > threshold) resize(); // Empty callback function, overridden in LinkedHashMap afterNodeInsertion(evict); return null; } // Create a regular (non-tree) node // Encapsulate key-value as a normal ode Node<K,V> newNode(int hash, K key, V value, Node<K,V> next) { return new Node<>(hash, key, value, next); }
HashMap.put(key, value) inserts a simplified mappingX(key-value, assuming that the index computed by this key is i):
- If table is empty, resize(), encapsulate mappingX as Node, insert it into table[i], modify Map.modCount and Map.size properties, and return null.
- If table[i] is empty, encapsulate mappingX as Node, insert it into table[i], modify Map.modCount and Map.size properties and return null.
- If there is already a Node P on table[i], P logically equals mappingX(hash value is the same, key is the same), encapsulate mappingX as Node, replace p, and return the old value p.value.
- If there is already a TreeNode p on table[i], insert mappingX into the tree by putTreeVal. If there is a logically equal node p, replace p, and return the old value p.value; If it does not exist, return null.
- Traverse the chain table on table[i]. If you find the logically equal node p, encapsulate mappingX as Node, replace p, and return the old value p.value; If it does not exist, encapsulate mappingX as Node, insert it at the end of the list, resize() if it needs to be expanded after insertion, and return null.
3 resize method expands table
/** * Initializes or doubles table size. If null, allocates in * accord with initial capacity target held in field threshold. * Otherwise, because we are using power-of-two expansion, the * elements from each bin must either stay at same index, or move * with a power of two offset in the new table. * * Initialize or double the table capacity. * If the table is empty, allocate the initial capacity. Otherwise, because it is extended by an integer power of 2 (large), * Elements in each bin are either left on the original index or moved to the offset of the new extension. */ final Node<K,V>[] resize() { // Assignment: oldTab as table before expansion Node<K,V>[] oldTab = table; // Assignment: oldCap assignment is table.length before expansion int oldCap = (oldTab == null) ? 0 : oldTab.length; // Assignment: oldThr assignment is threshold before expansion int oldThr = threshold; int newCap, newThr = 0; // OldCap > 0, indicating that the table has been initialized if (oldCap > 0) { // If the current capacity has reached the maximum if (oldCap >= MAXIMUM_CAPACITY) { threshold = Integer.MAX_VALUE; // Return to the current table without expansion return oldTab; } // Assignment: newCap is assigned oldCap*2, which is twice the current capacity // If newCap <maximum capacity limit and oldCap >=initial capacity 16 else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY && oldCap >= DEFAULT_INITIAL_CAPACITY) // Assignment: newThr assigns oldThr*2, which is twice the current threshold newThr = oldThr << 1; // double threshold } //If the current table is empty, but there is a threshold value. Represents that capacity was specified at the time of initialization, the threshold value else if (oldThr > 0) // initial capacity was placed in threshold // Assignment: The capacity of the new table is assigned to the old threshold newCap = oldThr; // The current table is empty and has no threshold. else { // zero initial threshold signifies using defaults // Assignment: The table capacity assignment is the default of 16 newCap = DEFAULT_INITIAL_CAPACITY; // Assignment: New threshold assignment is default load factor 0.75f * default capacity 16 = 12 newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY); } // If the new threshold is 0, it means that the current table is empty, but there are thresholds if (newThr == 0) { // New thresholds based on new table capacity and load factor (expanded thresholds) float ft = (float)newCap * loadFactor; // Cross-border Repair newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ? (int)ft : Integer.MAX_VALUE); } // Update threshold threshold = newThr; //Build a new Node array newTab based on the new capacity @SuppressWarnings({"rawtypes","unchecked"}) Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap]; // Update table references table = newTab; // If oldTab has elements, you need to move the elements from oldTab to newTab if (oldTab != null) { // Traverse oldCap for (int j = 0; j < oldCap; ++j) { Node<K,V> e; // Assignment: eAssignment is the oldTab[j] currently traversed if ((e = oldTab[j]) != null) { // Leave the oldTab[j] bin empty for GC convenience oldTab[j] = null; // If there is only one element in the current list (no hash collision) if (e.next == null) // Place elements in newTab newTab[e.hash & (newCap - 1)] = e; // If a hash collision occurs and Node has been converted to TreeNode else if (e instanceof TreeNode) // Leave it alone for now ((TreeNode<K,V>)e).split(this, newTab, j, oldCap); // If a hash collision occurs and the number of nodes is less than eight (bin is a chain table structure) else { // preserve order // Because expansion is twice capacity, // So every node in the original list, // Perhaps in the original subscript, the low bit; // It may also be an expanded subscript, the high bit. // high bit = low bit + oldTab.length Node<K,V> loHead = null, loTail = null; Node<K,V> hiHead = null, hiTail = null; Node<K,V> next; do { next = e.next; // Equal to 0 means: the subscript after rehash is less than oldCap and should be stored at a low position // Otherwise it should be stored in a high position if ((e.hash & oldCap) == 0) { if (loTail == null) loHead = e; else loTail.next = e; loTail = e; } else { if (hiTail == null) hiHead = e; else hiTail.next = e; hiTail = e; } // Loop until end of list } while ((e = next) != null); // Store the low-order list in the original index if (loTail != null) { loTail.next = null; newTab[j] = loHead; } // Store the high-order list at the new index if (hiTail != null) { hiTail.next = null; newTab[j + oldCap] = hiHead; } } } } } return newTab; }
Be careful:
When resize() expands a table, it expands the table.length to twice its original size, which is reflected by moving it one bit to the left in binary.
For example: table.length=16, expands to 32.
The binary representation is:
Before expansion(16): 0000 1000 After expansion(32): 0001 0000
4 treeifyBin method: single-chain table to red-black tree
/** * Replaces all linked nodes in bin at index for given hash unless * table is too small, in which case resizes instead. * Replace all nodes of a single-chain list in bin at a given index (with a red-black tree). * If the table is small, expand it without replacing it. */ final void treeifyBin(Node<K,V>[] tab, int hash) { int n, index; Node<K,V> e; // tab is small and expands only if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY) resize(); // Assignment: The index assignment is the index calculated from the current hash value // Assignment: e assigns tab[index] // Judgment: e!= Null, that is, the bin position corresponding to the current hash value is not empty else if ((e = tab[index = (n - 1) & hash]) != null) { // hd stores the head of the double-chain table constructed below do-while, TL stores the tail of the double-chain table TreeNode<K,V> hd = null, tl = null; // Loop through a single-linked list, encapsulate Node as TreeNode, and construct a double-linked list structure in preparation for a transition to a red-black tree do { // Encapsulate Node as TreeNode TreeNode<K,V> p = replacementTreeNode(e, null); if (tl == null) hd = p; else { p.prev = tl; tl.next = p; } tl = p; } while ((e = e.next) != null); // Assignment: tab[index] = hd // hd!= Null, indicating the need to convert to a red-black tree, where hd is the head of a two-way chain table made up of TreeNode if ((tab[index] = hd) != null) // Two-way Chain List to Red-Black Tree hd.treeify(tab); } } // For treeifyBin TreeNode<K,V> replacementTreeNode(Node<K,V> p, Node<K,V> next) { return new TreeNode<>(p.hash, p.key, p.value, next); }
The treeify method is an instance member method of TreeNode:
/** * Forms tree of the nodes linked from this node. */ final void treeify(Node<K,V>[] tab) { TreeNode<K,V> root = null; // This is a TreeNode instance object that calls this method // Traversing the double-chain table mechanism pointed to by this to construct a red-black tree for (TreeNode<K,V> x = this, next; x != null; x = next) { next = (TreeNode<K,V>)x.next; x.left = x.right = null; // Root node is empty (root has not been constructed yet), then root is constructed if (root == null) { x.parent = null; // Root node is black x.red = false; root = x; } // The root node has been constructed, and the other descendant nodes are constructed below else { K k = x.key; int h = x.hash; Class<?> kc = null; // Dead cycle until x is added to the red-black tree structure and exits for (TreeNode<K,V> p = root;;) { int dir, ph; K pk = p.key; // If the hash value of the x node is less than the hash value of the p node if ((ph = p.hash) > h) // Assigning dir to -1 means looking to the left of p dir = -1; // Hash value of x node is greater than hash value of p node else if (ph < h) // Assigning dir to 1 means looking to the right of p dir = 1; // If the hash value of x is equal to the hash value of p, the key value is compared and the details are omitted. else if ((kc == null && (kc = comparableClassFor(k)) == null) || (dir = compareComparables(kc, k, pk)) == 0) dir = tieBreakOrder(k, pk); TreeNode<K,V> xp = p; // Assignment: P assigns p.left or p.right. // If p == null, how do you find a place to put x if ((p = (dir <= 0) ? p.left : p.right) == null) { x.parent = xp; if (dir <= 0) xp.left = x; else xp.right = x; // x After inserting the structure of the red-black tree, adjust to make the red-black tree continue to meet the definition of the red-black tree root = balanceInsertion(root, x); break; } } } } // Ensures that the given root is the first node of its bin moveRootToFront(tab, root); }
Quote
[1]. Detailed Hash algorithm in HashMap (perturbation function)
[2]. (9) Concurrent containers for in-depth concurrent programming: blocking queues, replicating containers while writing, and locking segmented containers
[3]. Interview Requirements: HashMap Source Parsing (JDK8)
[4]. Deep Understanding of HashMap Principle (1) - HashMap Source Parsing (JDK 1.8)
[5]. java.util.HashMap(Java 8)