HashMap key source code analysis

Posted by AlanG on Mon, 07 Mar 2022 11:40:38 +0100

1, Introduction to HashMap

HashMap is one of our commonly used collection frameworks. The underlying implementation is array + linked list + red black tree. The elements are out of order, that is, the storage order is not necessarily in the order of addition. Synchronization is not supported, that is, the thread is unsafe.

2, Key parameters of HashMap

To understand how HashMap is implemented, we must first understand its key variables, which also contain a lot of wisdom. I wrote a simple explanation in the notes.

	/**
     * The default initial capacity must be to the power of 2
     */
    static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; //16

	/**
     * The default maximum capacity must be to the power of 2
     */
    static final int MAXIMUM_CAPACITY = 1 << 30; //2^30

    /**
     * Load factor: the threshold of the capacity expansion mechanism. When the threshold is exceeded, the capacity expansion mechanism will be triggered
     * 1.0: hash There are many conflicts, the red and black trees at the bottom are complex, and time changes space
     * 0.5: The length of the underlying linked list or the height of the red black tree will be reduced, the query efficiency will be higher, the space utilization will be reduced, and the space will change time
     */
    static final float DEFAULT_LOAD_FACTOR = 0.75f;

    /**
     * When the length of the linked list is 8, try to convert it into a red black tree (the conversion to a red black tree also needs to meet the following conditions: array length > = 64)
     * In the case of random hashCode, the distribution frequency of nodes in this linked list follows Poisson distribution
     * When the expansion threshold is 0.75, it follows the Poisson distribution with an average parameter of 0.5, and the probability of length 8 is only 0.00000006
     * Red and black trees need more space than linked lists
     */
    static final int TREEIFY_THRESHOLD = 8;

    /**
     * One of the conditions of red black tree degradation linked list
     */
    static final int UNTREEIFY_THRESHOLD = 6;

    /**
     * The minimum value that the size of the array needs to meet when converting the red black tree
     */
    static final int MIN_TREEIFY_CAPACITY = 64;

3, Key functions of HashMap

1. How is HashCode generated?

HashMap needs to calculate the hash value of elements when storing data. We all know that the Object class provides the default hashcode() method, but the calculation of hash value in HashMap does not directly return hashcode(), but obtains the final hash value after the calculation of hashcode and then disturbed by the hash() function

	/*
	 * The hashcode method in the element returns the hashcode value of key and the hashcode value of XOR value
	 */
	public final int hashCode() {
            return Objects.hashCode(key) ^ Objects.hashCode(value);
    }

	/**
     * hascode Exclusive or hashcode logic shifts 16 bits to the right
     */
	static final int hash(Object key) {
        int h;
        return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
    }

2. What is the put operation process of HashMap?

First judge whether the table is empty. If it is empty, resize()
Calculate the index. If there is no element in the corresponding position, add it directly. If there is, see whether it is the same element. If yes, replace it. If not, continue
Loop traversal. If there are the same nodes in the middle, break exits the loop, otherwise it will always find the end of the linked list. If the number reaches the treelization factor at this time, treeifyBin() attempts treelization
If the same node exits, replace and return the old value;
Judge whether the new size is greater than the threshold. If so, resize()

The source code of the function putVal is as follows. I summarize this process into the following steps:

 	/**
     * Implements Map.put and related methods.
     *
     * @param hash hash for key
     * @param key the key
     * @param value the value to put
     * @param onlyIfAbsent if true, don't change existing value
     * @param evict if false, the table is in creation mode.
     * @return previous value, or null if none
     */
    final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        Node<K,V>[] tab; Node<K,V> p; int n, i;
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;
        if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);
        else {
            Node<K,V> e; K k;
            if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
                e = p;
            else if (p instanceof TreeNode)
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            else {
                for (int binCount = 0; ; ++binCount) {
                    if ((e = p.next) == null) {
                        p.next = newNode(hash, key, value, null);
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab, hash);
                        break;
                    }
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        break;
                    p = e;
                }
            }
            if (e != null) { // existing mapping for key
                V oldValue = e.value;
                if (!onlyIfAbsent || oldValue == null)
                    e.value = value;
                afterNodeAccess(e);
                return oldValue;
            }
        }
        ++modCount;
        if (++size > threshold)
            resize();
        afterNodeInsertion(evict);
        return null;
    }

3. Timing of linked list conversion to red black tree

In the process of put, treeifyBin() appears, which is used to convert the linked list into a red black tree. The timing, or conditions, for the conversion of linked list into red black tree must meet the following two requirements:

The number of linked list elements reaches TREEIFY_THRESHOLD(8 by default)
Array size reaches min_ TREEIFY_ Capability (64 by default)
After the length of the linked list is satisfied, it will only try to convert. After entering the treeifyBin() method, first judge whether the array size is satisfied. The source code is as follows:

	/**
     * Replaces all linked nodes in bin at index for given hash unless
     * table is too small, in which case resizes instead.
     */
    final void treeifyBin(Node<K,V>[] tab, int hash) {
        int n, index; Node<K,V> e;
        if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY)
            resize();
        else if ((e = tab[index = (n - 1) & hash]) != null) {
            TreeNode<K,V> hd = null, tl = null;
            do {
                TreeNode<K,V> p = replacementTreeNode(e, null);
                if (tl == null)
                    hd = p;
                else {
                    p.prev = tl;
                    tl.next = p;
                }
                tl = p;
            } while ((e = e.next) != null);
            if ((tab[index] = hd) != null)
                hd.treeify(tab);
        }
    }

4. Capacity expansion mechanism of HashMap

resize() appears in the above source code, which is the function of capacity expansion. This part of the source code is relatively long, so I won't post it. I summarize the process of this function as the following steps. The first two steps are mainly for special cases such as too small or too large capacity value. Generally, the capacity expansion can be seen from step 3:

* If the old capacity is 0, the capacity threshold is set to the default value
* If the old capacity is greater than the maximum capacity, the threshold is set to Integer.MAX_VALUE
* If(New capacity=Old capacity*2) Less than the maximum capacity and the old capacity is greater than the default initial capacity(16)，The new threshold is set to 2 times the old threshold
* Create a new array and traverse the old bucket. If there are no other elements after the element, insert it directly into the new position newTab[e.hash & (newCap - 1)]
* If the following element is TreeNode，Press TreeNode Mode insertion; The linked list type is inserted in the corresponding way

After capacity expansion, the element is either in the original position or in the original position+The position of the length of the original array, and the order of the linked list remains unchanged, avoiding all re rehash

4, Some problems of HashMap

Q1: why is the treeing threshold 8 instead of 7?

A: If the tree is 7 and the degradation is 6, it will cause frequent tree and linked list conversion.

Q2: why is the treeing threshold not larger or smaller?

A: In the case of random hashCode, the distribution frequency of nodes in this linked list follows Poisson distribution. When the expansion threshold is 0.75, it follows the Poisson distribution with an average parameter of 0.5, and the probability of length 8 is only 0.00000006. If it is too small, it will be easy to turn into red and black trees, and red and black trees consume more space. If it is too large, it is difficult to have the probability to turn into a red black tree, and the array + linked list may degenerate into a linked list in extreme cases, resulting in reduced efficiency.

Q3: Why are packing types such as String and Integer more suitable for HashMap keys?

A: final modification, with immutability, ensures that the key will not be changed, and there will be no difference in hash value between storage and acquisition. When the string is created, the hashcode is cached and does not need to be recalculated. This makes the string very suitable as the key in the Map, and the processing speed of the string is faster than other key objects. In addition, the hashcode and equals methods have been rewritten inside these wrapper types, so there will be no hash calculation error.

Topics: Java

Programmer Think