HashMap source code analysis

Posted by timbuckthree on Mon, 24 Jan 2022 12:18:49 +0100

HashMap source code analysis

preface

Tip: Here you can add the general contents to be recorded in this article:
HashMap is a very common collection. Its data structure and design are very classic. As a java programmer, we must deeply understand its underlying implementation. Next, share with you by reading the source code of HashMap.

1, Introduction to HashMap

HashMap is mainly used to store key value pairs. It is implemented based on the Map interface of hash table. It is one of the commonly used Java collections and is non thread safe.
HashMap can store null keys and values, but there can only be one null as a key and multiple null as a value.
JDK1. Before 8, HashMap was composed of array + linked list. Array is the main body of HashMap, and linked list mainly exists to solve hash conflicts. JDK1. HashMap after 8 has changed greatly in resolving hash conflicts. When the length of the linked list is greater than the threshold (8 by default) and the length of the array is greater than 64, the linked list is transformed into a red black tree to reduce the search time.
The default initialization size of HashMap is 16. After each expansion, the capacity becomes twice the original. Moreover, HashMap always uses the power of 2 as the size of the hash table.

2, Analysis of underlying data structure

1.JDK1. Before 8

JDK1. Before 8, the bottom layer of HashMap was the combination of array and linked list, that is, linked list hash.
HashMap obtains the hash value through the hashCode of the key after being processed by the perturbation function, and then determines the storage location of the current element through (array length - 1) & hash. If there is an element in the current location, it determines whether the hash value of the element and the element to be stored and the key are the same. If they are the same, they are directly overwritten. If they are different, the conflict is solved through the zipper method.
The so-called perturbation function refers to the hash method of HashMap. The purpose is to reduce hash conflicts.
JDK1.8. Source code of hash method before:

static int hash(int h) {
    // This function ensures that hashCodes that differ only by
    // constant multiples at each bit position have a bounded
    // number of collisions (approximately 8 at default load factor).

    h ^= (h >>> 20) ^ (h >>> 12);
    return h ^ (h >>> 7) ^ (h >>> 4);
}

2.JDK1. After 8

JDK1. After 8, the underlying data structure is composed of array + linked list or red black tree.
When the length of the linked list is greater than the threshold (8 by default), the treeifyBin() method will be called first. This method will determine whether to convert to red black tree according to HashMap array. Only when the array length is greater than or equal to 64, the red black tree conversion operation will be performed to reduce the search time. Otherwise, you just execute the resize() method to expand the array.
JDK1. Source code of hash method after 8:

    static final int hash(Object key) {
      int h;
      // key.hashCode(): returns the hash value, that is, hashcode
      // ^: bitwise XOR
      // >>>: move unsigned right, ignore the sign bit, and fill up the empty bits with 0
      return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
  }

2, HashMap core source code

1. Member variables

public class HashMap<K,V> extends AbstractMap<K,V> implements Map<K,V>, Cloneable, Serializable {
    // serial number
    private static final long serialVersionUID = 362498820763181265L;
    // The default initial capacity is 16
    static final int DEFAULT_INITIAL_CAPACITY = 1 << 4;
    // Maximum capacity
    static final int MAXIMUM_CAPACITY = 1 << 30;
    // Default fill factor
    static final float DEFAULT_LOAD_FACTOR = 0.75f;
    // When the number of nodes on the bucket is greater than this value, it will turn into a red black tree
    static final int TREEIFY_THRESHOLD = 8;
    // When the number of nodes on the bucket is less than this value, the tree will turn to the linked list
    static final int UNTREEIFY_THRESHOLD = 6;
    // The structure in the bucket is transformed into the minimum size of the table corresponding to the red black tree
    static final int MIN_TREEIFY_CAPACITY = 64;
    // An array of storage elements is always a power of 2
    transient Node<k,v>[] table;
    // A set that holds concrete elements
    transient Set<map.entry<k,v>> entrySet;
    // The number of elements to store. Note that this is not equal to the length of the array.
    transient int size;
    // Counters for each expansion and change of map structure
    transient int modCount;
    // Critical value when the actual size (capacity * filling factor) exceeds the critical value, capacity expansion will be carried out
    int threshold;
    // Loading factor
    final float loadFactor;
}

loadFactor load factor
The loadFactor loading factor controls the density of the data stored in the array. The closer the loadFactor is to 1, the more data (entries) stored in the array will be, the more dense it will be, that is, it will increase the length of the linked list. The smaller the loadFactor is, that is, it will approach 0, the less data (entries) stored in the array will be, and the more sparse it will be.

Too large loadFactor leads to low efficiency in finding elements, too small leads to low utilization of arrays, and the stored data will be very scattered. The default value of loadFactor is 0.75f, which is a good critical value officially given.

The given default capacity is 16 and the load factor is 0.75. During the use of Map, data is constantly stored in it. When the number reaches 16 * 0.75 = 12, the capacity of the current 16 needs to be expanded. This process involves rehash, copying data and other operations, so it consumes a lot of performance.

threshold
threshold = capacity * loadFactor. When size > = threshold, the expansion of the array should be considered, that is, this means a standard to measure whether the array needs to be expanded.

2. Construction method

    // Default constructor.
    public HashMap() {
        this.loadFactor = DEFAULT_LOAD_FACTOR; // all   other fields defaulted
     }

     // Constructor containing another "Map"
     public HashMap(Map<? extends K, ? extends V> m) {
         this.loadFactor = DEFAULT_LOAD_FACTOR;
         putMapEntries(m, false);//This method will be analyzed below
     }

     // Specifies the constructor for the capacity size
     public HashMap(int initialCapacity) {
         this(initialCapacity, DEFAULT_LOAD_FACTOR);
     }

     // Specifies the constructor for capacity size and load factor
     public HashMap(int initialCapacity, float loadFactor) {
         if (initialCapacity < 0)
             throw new IllegalArgumentException("Illegal initial capacity: " + initialCapacity);
         if (initialCapacity > MAXIMUM_CAPACITY)
             initialCapacity = MAXIMUM_CAPACITY;
         if (loadFactor <= 0 || Float.isNaN(loadFactor))
             throw new IllegalArgumentException("Illegal load factor: " + loadFactor);
         this.loadFactor = loadFactor;
         this.threshold = tableSizeFor(initialCapacity);
     }

3. Membership method

(1)putMapEntries(Map<? extends K, ? extends V> m, boolean evict)

final void putMapEntries(Map<? extends K, ? extends V> m, boolean evict) {
    int s = m.size();
    if (s > 0) {
        // Determine whether the table has been initialized
        if (table == null) { // pre-size
            // Uninitialized, s is the actual number of elements of m
            float ft = ((float)s / loadFactor) + 1.0F;
            int t = ((ft < (float)MAXIMUM_CAPACITY) ?
                    (int)ft : MAXIMUM_CAPACITY);
            // If the calculated t is greater than the threshold, the threshold is initialized
            if (t > threshold)
                threshold = tableSizeFor(t);
        }
        // It has been initialized and the number of m elements is greater than the threshold value. Capacity expansion is required
        else if (s > threshold)
            resize();
        // Add all elements in m to HashMap
        for (Map.Entry<? extends K, ? extends V> e : m.entrySet()) {
            K key = e.getKey();
            V value = e.getValue();
            putVal(hash(key), key, value, false, evict);
        }
    }
}

(2)JDK1. Put before 8 (k key, V value)

public V put(K key, V value)
    if (table == EMPTY_TABLE) {
    inflateTable(threshold);
}
    if (key == null)
        return putForNullKey(value);
    int hash = hash(key);
    int i = indexFor(hash, table.length);
    for (Entry<K,V> e = table[i]; e != null; e = e.next) { // First traversal
        Object k;
        if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
            V oldValue = e.value;
            e.value = value;
            e.recordAccess(this);
            return oldValue;
        }
    }

    modCount++;
    addEntry(hash, key, value, i);  // Reinsert
    return null;
}

analysis:

  1. If there is no element in the array position, insert it directly;
  2. If there is an element in the located array position, traverse the linked list with this element as the head node, and compare it with the inserted key in turn. If the key is the same, it will be directly overwritten. If the key is different, the head insertion method will be used to insert the element.

(3)JDK1. Put after 8 (k key, V value)

HashMap provides the user with a put method for adding elements. The bottom layer of the put method actually calls the putVal method.

public V put(K key, V value) {
    return putVal(hash(key), key, value, false, true);
}

(4)putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict)

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    // The table is uninitialized or has a length of 0. Capacity expansion is required
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    // (n - 1) & hash determines the bucket in which the elements are stored. The bucket is empty, and the newly generated node is placed in the bucket (at this time, the node is placed in the array)
    if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);
    // Element already exists in bucket
    else {
        Node<K,V> e; K k;
        // The hash value of the first element (node in the array) in the comparison bucket is equal, and the key is equal
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
                // Assign the first element to e and record it with E
                e = p;
        // hash values are not equal, that is, key s are not equal; Red black tree node
        else if (p instanceof TreeNode)
            // Put it in the tree
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        // Is a linked list node
        else {
            // Insert a node at the end of the linked list
            for (int binCount = 0; ; ++binCount) {
                // Reach the end of the linked list
                if ((e = p.next) == null) {
                    // Insert a new node at the end
                    p.next = newNode(hash, key, value, null);
                    // When the number of nodes reaches the threshold (8 by default), execute the treeifyBin method
                    // This method will determine whether to convert to red black tree according to HashMap array.
                    // Only when the array length is greater than or equal to 64, the red black tree conversion operation will be performed to reduce the search time. Otherwise, it is just an expansion of the array.
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash);
                    // Jump out of loop
                    break;
                }
                // Judge whether the key value of the node in the linked list is equal to the key value of the inserted element
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    // Equal, jump out of loop
                    break;
                // Used to traverse the linked list in the bucket. Combined with the previous e = p.next, you can traverse the linked list
                p = e;
            }
        }
        // Indicates that a node whose key value and hash value are equal to the inserted element is found in the bucket
        if (e != null) {
            // Record the value of e
            V oldValue = e.value;
            // onlyIfAbsent is false or the old value is null
            if (!onlyIfAbsent || oldValue == null)
                //Replace old value with new value
                e.value = value;
            // Post access callback
            afterNodeAccess(e);
            // Return old value
            return oldValue;
        }
    }
    // Structural modification
    ++modCount;
    // If the actual size is greater than the threshold, the capacity will be expanded
    if (++size > threshold)
        resize();
    // Post insert callback
    afterNodeInsertion(evict);
    return null;
}

analysis:

  1. If there is no element in the array position, insert it directly;
  2. If there are elements in the location of the array, compare it with the key to be inserted. If the key is the same, directly overwrite it. If the key is different, judge whether p is a tree node. If so, call e = ((treenode < K, V >) p) Puttreeval (this, tab, hash, key, value) adds elements into the list. If not, it will traverse the list and insert (the end of the list is inserted).

(5)get(Object key)

public V get(Object key) {
    Node<K,V> e;
    return (e = getNode(hash(key), key)) == null ? null : e.value;
}

final Node<K,V> getNode(int hash, Object key) {
    Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
    if ((tab = table) != null && (n = tab.length) > 0 &&
        (first = tab[(n - 1) & hash]) != null) {
        // Array elements are equal
        if (first.hash == hash && // always check first node
            ((k = first.key) == key || (key != null && key.equals(k))))
            return first;
        // More than one node in bucket
        if ((e = first.next) != null) {
            // get in tree
            if (first instanceof TreeNode)
                return ((TreeNode<K,V>)first).getTreeNode(hash, key);
            // get in linked list
            do {
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    return e;
            } while ((e = e.next) != null);
        }
    }
    return null;
}

(6)resize()

Capacity expansion will be accompanied by a re hash allocation, and all elements in the hash table will be traversed, which is very time-consuming. When writing programs, try to avoid resize.

final Node<K,V>[] resize() {
    Node<K,V>[] oldTab = table;
    int oldCap = (oldTab == null) ? 0 : oldTab.length;
    int oldThr = threshold;
    int newCap, newThr = 0;
    if (oldCap > 0) {
        // If you exceed the maximum value, you won't expand any more, so you have to collide with you
        if (oldCap >= MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return oldTab;
        }
        // If the maximum value is not exceeded, it will be expanded to twice the original value
        else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY && oldCap >= DEFAULT_INITIAL_CAPACITY)
            newThr = oldThr << 1; // double threshold
    }
    else if (oldThr > 0) // initial capacity was placed in threshold
        newCap = oldThr;
    else {
        // signifies using defaults
        newCap = DEFAULT_INITIAL_CAPACITY;
        newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
    }
    // Calculate the new resize upper limit
    if (newThr == 0) {
        float ft = (float)newCap * loadFactor;
        newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ? (int)ft : Integer.MAX_VALUE);
    }
    threshold = newThr;
    @SuppressWarnings({"rawtypes","unchecked"})
        Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
    table = newTab;
    if (oldTab != null) {
        // Move each bucket to a new bucket
        for (int j = 0; j < oldCap; ++j) {
            Node<K,V> e;
            if ((e = oldTab[j]) != null) {
                oldTab[j] = null;
                if (e.next == null)
                    newTab[e.hash & (newCap - 1)] = e;
                else if (e instanceof TreeNode)
                    ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                else {
                    Node<K,V> loHead = null, loTail = null;
                    Node<K,V> hiHead = null, hiTail = null;
                    Node<K,V> next;
                    do {
                        next = e.next;
                        // Original index
                        if ((e.hash & oldCap) == 0) {
                            if (loTail == null)
                                loHead = e;
                            else
                                loTail.next = e;
                            loTail = e;
                        }
                        // Original index + oldCap
                        else {
                            if (hiTail == null)
                                hiHead = e;
                            else
                                hiTail.next = e;
                            hiTail = e;
                        }
                    } while ((e = next) != null);
                    // Put the original index into the bucket
                    if (loTail != null) {
                        loTail.next = null;
                        newTab[j] = loHead;
                    }
                    // Put the original index + oldCap into the bucket
                    if (hiTail != null) {
                        hiTail.next = null;
                        newTab[j + oldCap] = hiHead;
                    }
                }
            }
        }
    }
    return newTab;
}

4. Internal class

Node class source code:

// Inherited from map Entry<K,V>
static class Node<K,V> implements Map.Entry<K,V> {
       final int hash;// Hash value, which is used to compare with hash values of other elements when storing elements in hashmap
       final K key;//key
       V value;//value
       // Point to next node
       Node<K,V> next;
       Node(int hash, K key, V value, Node<K,V> next) {
            this.hash = hash;
            this.key = key;
            this.value = value;
            this.next = next;
        }
        public final K getKey()        { return key; }
        public final V getValue()      { return value; }
        public final String toString() { return key + "=" + value; }
        // Override hashCode() method
        public final int hashCode() {
            return Objects.hashCode(key) ^ Objects.hashCode(value);
        }

        public final V setValue(V newValue) {
            V oldValue = value;
            value = newValue;
            return oldValue;
        }
        // Override the equals() method
        public final boolean equals(Object o) {
            if (o == this)
                return true;
            if (o instanceof Map.Entry) {
                Map.Entry<?,?> e = (Map.Entry<?,?>)o;
                if (Objects.equals(key, e.getKey()) &&
                    Objects.equals(value, e.getValue()))
                    return true;
            }
            return false;
        }
}

Tree node class source code:

static final class TreeNode<K,V> extends LinkedHashMap.Entry<K,V> {
        TreeNode<K,V> parent;  // father
        TreeNode<K,V> left;    // Left
        TreeNode<K,V> right;   // right
        TreeNode<K,V> prev;    // needed to unlink next upon deletion
        boolean red;           // Judge color
        TreeNode(int hash, K key, V val, Node<K,V> next) {
            super(hash, key, val, next);
        }
        // Return root node
        final TreeNode<K,V> root() {
            for (TreeNode<K,V> r = this, p;;) {
                if ((p = r.parent) == null)
                    return r;
                r = p;
       }

summary

Tip: here is a summary of the article:
For example, the above is what we want to talk about today. This paper only briefly introduces the use of pandas, which provides a large number of functions and methods that enable us to process data quickly and conveniently.

Topics: HashMap