HashMap source code analysis
preface
Tip: Here you can add the general contents to be recorded in this article:
HashMap is a very common collection. Its data structure and design are very classic. As a java programmer, we must deeply understand its underlying implementation. Next, share with you by reading the source code of HashMap.
1, Introduction to HashMap
HashMap is mainly used to store key value pairs. It is implemented based on the Map interface of hash table. It is one of the commonly used Java collections and is non thread safe.
HashMap can store null keys and values, but there can only be one null as a key and multiple null as a value.
JDK1. Before 8, HashMap was composed of array + linked list. Array is the main body of HashMap, and linked list mainly exists to solve hash conflicts. JDK1. HashMap after 8 has changed greatly in resolving hash conflicts. When the length of the linked list is greater than the threshold (8 by default) and the length of the array is greater than 64, the linked list is transformed into a red black tree to reduce the search time.
The default initialization size of HashMap is 16. After each expansion, the capacity becomes twice the original. Moreover, HashMap always uses the power of 2 as the size of the hash table.
2, Analysis of underlying data structure
1.JDK1. Before 8
JDK1. Before 8, the bottom layer of HashMap was the combination of array and linked list, that is, linked list hash.
HashMap obtains the hash value through the hashCode of the key after being processed by the perturbation function, and then determines the storage location of the current element through (array length - 1) & hash. If there is an element in the current location, it determines whether the hash value of the element and the element to be stored and the key are the same. If they are the same, they are directly overwritten. If they are different, the conflict is solved through the zipper method.
The so-called perturbation function refers to the hash method of HashMap. The purpose is to reduce hash conflicts.
JDK1.8. Source code of hash method before:
static int hash(int h) { // This function ensures that hashCodes that differ only by // constant multiples at each bit position have a bounded // number of collisions (approximately 8 at default load factor). h ^= (h >>> 20) ^ (h >>> 12); return h ^ (h >>> 7) ^ (h >>> 4); }
2.JDK1. After 8
JDK1. After 8, the underlying data structure is composed of array + linked list or red black tree.
When the length of the linked list is greater than the threshold (8 by default), the treeifyBin() method will be called first. This method will determine whether to convert to red black tree according to HashMap array. Only when the array length is greater than or equal to 64, the red black tree conversion operation will be performed to reduce the search time. Otherwise, you just execute the resize() method to expand the array.
JDK1. Source code of hash method after 8:
static final int hash(Object key) { int h; // key.hashCode(): returns the hash value, that is, hashcode // ^: bitwise XOR // >>>: move unsigned right, ignore the sign bit, and fill up the empty bits with 0 return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16); }
2, HashMap core source code
1. Member variables
public class HashMap<K,V> extends AbstractMap<K,V> implements Map<K,V>, Cloneable, Serializable { // serial number private static final long serialVersionUID = 362498820763181265L; // The default initial capacity is 16 static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // Maximum capacity static final int MAXIMUM_CAPACITY = 1 << 30; // Default fill factor static final float DEFAULT_LOAD_FACTOR = 0.75f; // When the number of nodes on the bucket is greater than this value, it will turn into a red black tree static final int TREEIFY_THRESHOLD = 8; // When the number of nodes on the bucket is less than this value, the tree will turn to the linked list static final int UNTREEIFY_THRESHOLD = 6; // The structure in the bucket is transformed into the minimum size of the table corresponding to the red black tree static final int MIN_TREEIFY_CAPACITY = 64; // An array of storage elements is always a power of 2 transient Node<k,v>[] table; // A set that holds concrete elements transient Set<map.entry<k,v>> entrySet; // The number of elements to store. Note that this is not equal to the length of the array. transient int size; // Counters for each expansion and change of map structure transient int modCount; // Critical value when the actual size (capacity * filling factor) exceeds the critical value, capacity expansion will be carried out int threshold; // Loading factor final float loadFactor; }
loadFactor load factor
The loadFactor loading factor controls the density of the data stored in the array. The closer the loadFactor is to 1, the more data (entries) stored in the array will be, the more dense it will be, that is, it will increase the length of the linked list. The smaller the loadFactor is, that is, it will approach 0, the less data (entries) stored in the array will be, and the more sparse it will be.
Too large loadFactor leads to low efficiency in finding elements, too small leads to low utilization of arrays, and the stored data will be very scattered. The default value of loadFactor is 0.75f, which is a good critical value officially given.
The given default capacity is 16 and the load factor is 0.75. During the use of Map, data is constantly stored in it. When the number reaches 16 * 0.75 = 12, the capacity of the current 16 needs to be expanded. This process involves rehash, copying data and other operations, so it consumes a lot of performance.
threshold
threshold = capacity * loadFactor. When size > = threshold, the expansion of the array should be considered, that is, this means a standard to measure whether the array needs to be expanded.
2. Construction method
// Default constructor. public HashMap() { this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted } // Constructor containing another "Map" public HashMap(Map<? extends K, ? extends V> m) { this.loadFactor = DEFAULT_LOAD_FACTOR; putMapEntries(m, false);//This method will be analyzed below } // Specifies the constructor for the capacity size public HashMap(int initialCapacity) { this(initialCapacity, DEFAULT_LOAD_FACTOR); } // Specifies the constructor for capacity size and load factor public HashMap(int initialCapacity, float loadFactor) { if (initialCapacity < 0) throw new IllegalArgumentException("Illegal initial capacity: " + initialCapacity); if (initialCapacity > MAXIMUM_CAPACITY) initialCapacity = MAXIMUM_CAPACITY; if (loadFactor <= 0 || Float.isNaN(loadFactor)) throw new IllegalArgumentException("Illegal load factor: " + loadFactor); this.loadFactor = loadFactor; this.threshold = tableSizeFor(initialCapacity); }
3. Membership method
(1)putMapEntries(Map<? extends K, ? extends V> m, boolean evict)
final void putMapEntries(Map<? extends K, ? extends V> m, boolean evict) { int s = m.size(); if (s > 0) { // Determine whether the table has been initialized if (table == null) { // pre-size // Uninitialized, s is the actual number of elements of m float ft = ((float)s / loadFactor) + 1.0F; int t = ((ft < (float)MAXIMUM_CAPACITY) ? (int)ft : MAXIMUM_CAPACITY); // If the calculated t is greater than the threshold, the threshold is initialized if (t > threshold) threshold = tableSizeFor(t); } // It has been initialized and the number of m elements is greater than the threshold value. Capacity expansion is required else if (s > threshold) resize(); // Add all elements in m to HashMap for (Map.Entry<? extends K, ? extends V> e : m.entrySet()) { K key = e.getKey(); V value = e.getValue(); putVal(hash(key), key, value, false, evict); } } }
(2)JDK1. Put before 8 (k key, V value)
public V put(K key, V value) if (table == EMPTY_TABLE) { inflateTable(threshold); } if (key == null) return putForNullKey(value); int hash = hash(key); int i = indexFor(hash, table.length); for (Entry<K,V> e = table[i]; e != null; e = e.next) { // First traversal Object k; if (e.hash == hash && ((k = e.key) == key || key.equals(k))) { V oldValue = e.value; e.value = value; e.recordAccess(this); return oldValue; } } modCount++; addEntry(hash, key, value, i); // Reinsert return null; }
analysis:
- If there is no element in the array position, insert it directly;
- If there is an element in the located array position, traverse the linked list with this element as the head node, and compare it with the inserted key in turn. If the key is the same, it will be directly overwritten. If the key is different, the head insertion method will be used to insert the element.
(3)JDK1. Put after 8 (k key, V value)
HashMap provides the user with a put method for adding elements. The bottom layer of the put method actually calls the putVal method.
public V put(K key, V value) { return putVal(hash(key), key, value, false, true); }
(4)putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict)
final V putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict) { Node<K,V>[] tab; Node<K,V> p; int n, i; // The table is uninitialized or has a length of 0. Capacity expansion is required if ((tab = table) == null || (n = tab.length) == 0) n = (tab = resize()).length; // (n - 1) & hash determines the bucket in which the elements are stored. The bucket is empty, and the newly generated node is placed in the bucket (at this time, the node is placed in the array) if ((p = tab[i = (n - 1) & hash]) == null) tab[i] = newNode(hash, key, value, null); // Element already exists in bucket else { Node<K,V> e; K k; // The hash value of the first element (node in the array) in the comparison bucket is equal, and the key is equal if (p.hash == hash && ((k = p.key) == key || (key != null && key.equals(k)))) // Assign the first element to e and record it with E e = p; // hash values are not equal, that is, key s are not equal; Red black tree node else if (p instanceof TreeNode) // Put it in the tree e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value); // Is a linked list node else { // Insert a node at the end of the linked list for (int binCount = 0; ; ++binCount) { // Reach the end of the linked list if ((e = p.next) == null) { // Insert a new node at the end p.next = newNode(hash, key, value, null); // When the number of nodes reaches the threshold (8 by default), execute the treeifyBin method // This method will determine whether to convert to red black tree according to HashMap array. // Only when the array length is greater than or equal to 64, the red black tree conversion operation will be performed to reduce the search time. Otherwise, it is just an expansion of the array. if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st treeifyBin(tab, hash); // Jump out of loop break; } // Judge whether the key value of the node in the linked list is equal to the key value of the inserted element if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k)))) // Equal, jump out of loop break; // Used to traverse the linked list in the bucket. Combined with the previous e = p.next, you can traverse the linked list p = e; } } // Indicates that a node whose key value and hash value are equal to the inserted element is found in the bucket if (e != null) { // Record the value of e V oldValue = e.value; // onlyIfAbsent is false or the old value is null if (!onlyIfAbsent || oldValue == null) //Replace old value with new value e.value = value; // Post access callback afterNodeAccess(e); // Return old value return oldValue; } } // Structural modification ++modCount; // If the actual size is greater than the threshold, the capacity will be expanded if (++size > threshold) resize(); // Post insert callback afterNodeInsertion(evict); return null; }
analysis:
- If there is no element in the array position, insert it directly;
- If there are elements in the location of the array, compare it with the key to be inserted. If the key is the same, directly overwrite it. If the key is different, judge whether p is a tree node. If so, call e = ((treenode < K, V >) p) Puttreeval (this, tab, hash, key, value) adds elements into the list. If not, it will traverse the list and insert (the end of the list is inserted).
(5)get(Object key)
public V get(Object key) { Node<K,V> e; return (e = getNode(hash(key), key)) == null ? null : e.value; } final Node<K,V> getNode(int hash, Object key) { Node<K,V>[] tab; Node<K,V> first, e; int n; K k; if ((tab = table) != null && (n = tab.length) > 0 && (first = tab[(n - 1) & hash]) != null) { // Array elements are equal if (first.hash == hash && // always check first node ((k = first.key) == key || (key != null && key.equals(k)))) return first; // More than one node in bucket if ((e = first.next) != null) { // get in tree if (first instanceof TreeNode) return ((TreeNode<K,V>)first).getTreeNode(hash, key); // get in linked list do { if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k)))) return e; } while ((e = e.next) != null); } } return null; }
(6)resize()
Capacity expansion will be accompanied by a re hash allocation, and all elements in the hash table will be traversed, which is very time-consuming. When writing programs, try to avoid resize.
final Node<K,V>[] resize() { Node<K,V>[] oldTab = table; int oldCap = (oldTab == null) ? 0 : oldTab.length; int oldThr = threshold; int newCap, newThr = 0; if (oldCap > 0) { // If you exceed the maximum value, you won't expand any more, so you have to collide with you if (oldCap >= MAXIMUM_CAPACITY) { threshold = Integer.MAX_VALUE; return oldTab; } // If the maximum value is not exceeded, it will be expanded to twice the original value else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY && oldCap >= DEFAULT_INITIAL_CAPACITY) newThr = oldThr << 1; // double threshold } else if (oldThr > 0) // initial capacity was placed in threshold newCap = oldThr; else { // signifies using defaults newCap = DEFAULT_INITIAL_CAPACITY; newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY); } // Calculate the new resize upper limit if (newThr == 0) { float ft = (float)newCap * loadFactor; newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ? (int)ft : Integer.MAX_VALUE); } threshold = newThr; @SuppressWarnings({"rawtypes","unchecked"}) Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap]; table = newTab; if (oldTab != null) { // Move each bucket to a new bucket for (int j = 0; j < oldCap; ++j) { Node<K,V> e; if ((e = oldTab[j]) != null) { oldTab[j] = null; if (e.next == null) newTab[e.hash & (newCap - 1)] = e; else if (e instanceof TreeNode) ((TreeNode<K,V>)e).split(this, newTab, j, oldCap); else { Node<K,V> loHead = null, loTail = null; Node<K,V> hiHead = null, hiTail = null; Node<K,V> next; do { next = e.next; // Original index if ((e.hash & oldCap) == 0) { if (loTail == null) loHead = e; else loTail.next = e; loTail = e; } // Original index + oldCap else { if (hiTail == null) hiHead = e; else hiTail.next = e; hiTail = e; } } while ((e = next) != null); // Put the original index into the bucket if (loTail != null) { loTail.next = null; newTab[j] = loHead; } // Put the original index + oldCap into the bucket if (hiTail != null) { hiTail.next = null; newTab[j + oldCap] = hiHead; } } } } } return newTab; }
4. Internal class
Node class source code:
// Inherited from map Entry<K,V> static class Node<K,V> implements Map.Entry<K,V> { final int hash;// Hash value, which is used to compare with hash values of other elements when storing elements in hashmap final K key;//key V value;//value // Point to next node Node<K,V> next; Node(int hash, K key, V value, Node<K,V> next) { this.hash = hash; this.key = key; this.value = value; this.next = next; } public final K getKey() { return key; } public final V getValue() { return value; } public final String toString() { return key + "=" + value; } // Override hashCode() method public final int hashCode() { return Objects.hashCode(key) ^ Objects.hashCode(value); } public final V setValue(V newValue) { V oldValue = value; value = newValue; return oldValue; } // Override the equals() method public final boolean equals(Object o) { if (o == this) return true; if (o instanceof Map.Entry) { Map.Entry<?,?> e = (Map.Entry<?,?>)o; if (Objects.equals(key, e.getKey()) && Objects.equals(value, e.getValue())) return true; } return false; } }
Tree node class source code:
static final class TreeNode<K,V> extends LinkedHashMap.Entry<K,V> { TreeNode<K,V> parent; // father TreeNode<K,V> left; // Left TreeNode<K,V> right; // right TreeNode<K,V> prev; // needed to unlink next upon deletion boolean red; // Judge color TreeNode(int hash, K key, V val, Node<K,V> next) { super(hash, key, val, next); } // Return root node final TreeNode<K,V> root() { for (TreeNode<K,V> r = this, p;;) { if ((p = r.parent) == null) return r; r = p; }
summary
Tip: here is a summary of the article:
For example, the above is what we want to talk about today. This paper only briefly introduces the use of pandas, which provides a large number of functions and methods that enable us to process data quickly and conveniently.