Java collection framework hash table HashMap source code analysis

Posted by joenjeru on Mon, 07 Feb 2022 19:43:31 +0100

👨‍🎓 Blogger homepage: Java tribute dust collection Miraitow
📆 Creation time: 🌴 February 7, 2022 20:23-2:03 🌴
📒 Content introduction: source code analysis of common collections. Later, there may be collections such as ArrayList
📚 Reference: [Xiao Liu speaks source code]
⏳ In short, I encourage you to watch the officials and try new tea with new fire, poetry and wine to take advantage of the time
📝 There are many contents and questions. I hope you can give me some advice 🙏
🎃 Welcome to like 👍 Collection ⭐ Leaving a message. 📝

1. What is Hash

Hash: English is hash, also known as hash
The basic principle is to convert any length input into fixed length output
The rule of this mapping is the Hash algorithm, and the binary string of the original data mapping is the Hash value

2. Features of Hash

  • 1. The original data cannot be deduced reversely from the Hash value
  • 2. A small change in the input data will result in completely different Hash values, and the same data must result in the same value
  • 3. The execution efficiency of Hash algorithm should be efficient, and the Hash value can be calculated quickly for long text
  • 4. The collision probability of hash algorithm is small

Because the Hash principle is to map the input space into a Hash space, and the Hash space is far smaller than the input space, according to the drawer principle, there must be different outputs with the same mapping

Drawer principle

There are 10 apples on the table. Put them in 9 drawers. There must be at least 2 apples in one drawer

Another example

There are 50 students, but only 20 nicknames. Each student must have a nickname, so there must be a duplicate nickname (the nickname is equivalent to the bucket of hash, and the student's nickname name is equivalent to the hash value)

3. Explanation of HashMap principle

Inheritance system of HashMap

HashMap inherits AbstractMap and implements Cloneable interface, Serializable interface and map < K, V > interface

Data structure analysis of Node

final int hash;
final K key;
V value;
Node<K,V> next;

Underlying data structure

Principle analysis of in put data

What is Hash collision

If I store an element and find that the Hash value of its Key is still 1122, then after the disturbance, its position is still 2. Therefore, there is a conflict at this time, and the conflict should be solved at this time.

Methods to solve Hash collision

  • open addressing
  • Zipper method [HashMap uses this method]

What is chaining

In jdk1 Before 7, if the amount of data is large, the probability of collision is also large. The collision forms a linked list, which is chaining. At this time, the chain of the zipper method will be very long, which will reduce the search speed (seven up and eight down, the head insertion method before 1.8 and the tail insertion method after 1.8)

So in jdk1 After 8, red and black trees were introduced

Capacity expansion principle of HashMap

Because when there are many data tables, the collision makes the conflict and search speed increase. At this time, it is necessary to expand the capacity

4. Tear source code

HashMap core attribute analysis (threshold, loadFactor, size, modCount)

Threshold: capacity expansion threshold

loadFactor: load factor

size: map actual number of elements

modCount: the number of times a map modifies an element, such as deleting and adding, but modifying the value of the same location does not increase

Constant analysis

The table size is missing. The default initialization capacity is 16

  static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

Maximum length of table

static final int MAXIMUM_CAPACITY = 1 << 30;

The default load factor is 0.75. It is not recommended to set it yourself, which is calculated by scientists

static final float DEFAULT_LOAD_FACTOR = 0.75f;

Tree threshold

static final int TREEIFY_THRESHOLD = 8;

Tree demotion to linked list threshold

static final int UNTREEIFY_THRESHOLD = 6;

Not when the linked list reaches 8, it can be trealized, but when the element reaches 64 and the chain is expressed to 8

static final int MIN_TREEIFY_CAPACITY = 64;

Attribute analysis

Hashtable

  transient Node<K,V>[] table;
  transient Set<Map.Entry<K,V>> entrySet;

Number of elements in the current hash table

  transient int size;

Modification times of current hash table structure
(you insert an element or subtract an element. Note that replacement is not a table structure modification, and addition and subtraction will not be performed)

  transient int modCount;

Capacity expansion threshold. When the current hash table exceeds the threshold, capacity expansion is triggered
threshold=capacity * loadFactor
The default is 16 * 0.75 = 12, which means that the capacity is expanded when the number of elements is greater than 12

  int threshold;

Load factor, generally not changed (0.75)

  final float loadFactor;

Analysis of construction method

There are four constructors

1. Construction method with two parameters (int initialCapacity, float loadFactor)

public HashMap(int initialCapacity, float loadFactor) {
//Throw an exception object when the initial capacity is less than zero
    if (initialCapacity < 0)
        throw new IllegalArgumentException("Illegal initial capacity: " +
                                           initialCapacity);
    //If the initial capacity is greater than the maximum capacity of the array, set the initial capacity to the maximum capacity
    if (initialCapacity > MAXIMUM_CAPACITY)
        initialCapacity = MAXIMUM_CAPACITY;
        //NaN is actually short for Not a Number
        //The value of 0.0f/0.0f is NaN. From a mathematical point of view, 0 / 0 is uncertain.
    if (loadFactor <= 0 || Float.isNaN(loadFactor))
        throw new IllegalArgumentException("Illegal load factor: " +
                                           loadFactor);
    this.loadFactor = loadFactor;
    //The reason why initialCapacity is not assigned directly is to ensure that the number is to the power of 2
    this.threshold = tableSizeFor(initialCapacity);
}

Let's see how tableSizeFor is implemented

Function: returns the N-th power greater than the minimum 2 of cap. For example, 16 is returned when cap = 10, 32 is returned when cap = 28, and this number must be to the power of 2
cap=10
n=10 -1=9;
0b1001 | 0b0100=>0b1101
0b1101 | 0b0011=>0b1111
0b1111 | 0b0000=>0b1111
0b1111 | 0b0000=>0b1111
0b1111 | 0b0000=>0b1111
Ob1111 = > to decimal is 15
return 16
So why did you lose one at the beginning?
If there is no minus one
hypothesis
cap=16
0b10000 | 0b01000 =>0b11000
0b11000 | 0b00100=>0b11110
0b11110 | 0b00001 =>0b11111
0b11111 | 0b00000=>0b11111
0b11111 | 0b00000=>0b11111
Ob11111 = > Convert to decimal 31
return 32
What we passed in is 16, and the result becomes 32, which is obviously inconsistent with the assumption that a number is 0001 1101 1100 = > 0001 1111 1111 + 1 = > 0010 0000 0000 must be the power of 2

  • The purpose of reassigning cap-1 to n is to find another target value that is greater than or equal to the original value. For example, binary 1000, decimal value is 8. If you don't operate directly without subtracting 1 from it, you will get the answer 10000, that is 16. Obviously not the result. After subtracting 1, the binary is 111, and then the operation will get the original value of 1000, that is, 8.

  • Maximum in HashMap_ Capability is the 30th power of 2. Combined with the implementation of tableSizeFor(), the reasons for setting are as follows:
    The maximum positive number of int can reach the 31st-1st power of 2, but there is no way to get the 31st power of 2. Therefore, the capacity cannot reach the 31st power of 2. We also need to make the capacity meet the power of 2. So set it to the 30th power of 2

   static final int tableSizeFor(int cap) {
        int n = cap - 1;
        n |= n >>> 1;
        n |= n >>> 2;
        n |= n >>> 4;
        n |= n >>> 8;
        n |= n >>> 16;
        return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
    }
public HashMap(int initialCapacity) {
    this(initialCapacity, DEFAULT_LOAD_FACTOR);
}

public HashMap() {
    this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
}   

public HashMap(Map<? extends K, ? extends V> m) {
    this.loadFactor = DEFAULT_LOAD_FACTOR;
    putMapEntries(m, false);
}


Core knowledge point: why does the length of a table have to be a power of 2

  • The algorithm for calculating Hash value is actually taking modulus, hash%length,

  • The efficiency of direct redundancy in the computer is not as good as that in the source code of displacement operation. Hash& (length-1)

  • To ensure hash% length = = hash & (length-1)

  • Then length must be the nth power of 2;

HashMap put method analysis - putVal

public V put(K key, V value) {
    return putVal(hash(key), key, value, false, true);
}
// Disturbance function
// Function: how to make 16 bits higher than the hash value of the key participate in routing operation when the table is relatively short
// XOR: 0 for the same, 1 for the different

// h = 0b 0010 0101 1010 1100 0011 1111 0010 1110

// 0b 0010 0101 1010 1100 0011 1111 0010 1110 [h]
// ^
// 0b 0000 0000 0000 0000 0010 0101 1010 1100 [h >>> 16]
// => 0010 0101 1010 1100 0001 1010 1000 0010
// When the table is not very long, let the high 16 bits also participate in order to reduce conflicts and collisions
static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);}
// The core method of put
// Hash: hash value of key
// key: key
// value: value
// onlyIfAbsent: if true, do not change the existing value
// evict: if false, the table is in creation mode.
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
               boolean evict) {
    // tab: hash table referencing the current hashMap
    // p: Elements of the current hash table
    // n: Indicates the length of the current hash table
    // i: Indicates the result of route addressing
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    
    // Delay initialization logic. When putVal is called for the first time, it will initialize the hash table that consumes the most memory in the hashMap object
    // If the table is null or the length is 0, the creation starts
    if ((tab = table) == null || (n = tab.length) == 0)
        // It will not be initialized until the data is inserted for the first time
        n = (tab = resize()).length;
    
    // In the simplest case, the bucket bit found by addressing is just null. At this time, it is directly assigned to the calculated position
    // tab and n are assigned in the previous if
    // Perform a routing operation (n - 1) & hash] to get the hash address
    // If there is no such element in the tab or it is equal to null
    if ((p = tab[i = (n - 1) & hash]) == null)
        // To create a new Node, k-v encapsulates a Node and places it in the i position of the tab
        tab[i] = newNode(hash, key, value, null);
    
    // At this time, it may be an array, a linked list or a red black tree
    else {
        // e: If it is not null, an element consistent with the k-v to be inserted is found
        // k: Represents a temporary K
        Node<K,V> e; K k;
        // p: Get in another branch if
        // Indicates that the element in the bucket is exactly the same as the key of the element you are currently inserting. Subsequent replacement operations will be performed
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
            e = p;
        // p: It's already treelized
        else if (p instanceof TreeNode)
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        //It's a linked list
        else {
            // The header element of the linked list is inconsistent with the key to be inserted,
            for (int binCount = 0; ; ++binCount) {
                // If you get to the end, add p to the last position
                if ((e = p.next) == null) {
                    p.next = newNode(hash, key, value, null);
                    // If the binCount of the current linked list is greater than the value of the benchmark tree, the tree operation will be performed
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        // Tree operation
                        treeifyBin(tab, hash);
                    break;
                }
                // How to make hash es equal and key s equal requires replacement
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    break;
                // loop
                p = e;
            }
        }
        // If e is not null, find the old value and return
        if (e != null) { // existing mapping for key
            V oldValue = e.value;
            if (!onlyIfAbsent || oldValue == null)
                // Overwrite new value
                e.value = value;
            afterNodeAccess(e);
            return oldValue;
        }
    }
    //The number of modifications increases, and the replacement of Node elements does not count
    ++modCount;
    // If the size of the table is larger than the threshold, execute resize() to expand the capacity
    if (++size > threshold)
        resize();
    afterNodeInsertion(evict);
    return null;
}

Core analysis of HashMap resize capacity expansion method

// resize() method
// Why do we need to expand capacity?
// When there are more and more elements, the search speed of hashMap increases from O(1) to O(n), resulting in serious chaining,
// In order to solve the decline of query efficiency caused by conflict, it is necessary to expand [capacity expansion is a very bad action]
final Node<K,V>[] resize() {
    // oldTab: refers to the hash table before capacity expansion
    Node<K,V>[] oldTab = table;
    // oldCap: indicates the length of the table array before capacity expansion
    int oldCap = (oldTab == null) ? 0 : oldTab.length;
    // oldThr: indicates the threshold value before capacity expansion and triggers the preview of this capacity expansion
    int oldThr = threshold;
    // newCap: size of table array after capacity expansion
    // newThr: the condition that triggers the next expansion after the expansion
    int newCap, newThr = 0;
    // If the condition is true, it indicates that the hash table in hashMap has been initialized and is a normal expansion
    if (oldCap > 0) {
        // If the length of the current array is greater than the maximum size that the hashMap can accommodate, it will not be expanded, and the original array will be returned directly
        // Set the maximum capacity expansion threshold to the maximum value of int
        if (oldCap >= MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return oldTab;
        }
        // The size of the new table is twice that of the source table
        // Notify that the newCap after capacity expansion is less than the maximum limit of the array, and the threshold before capacity expansion is 16
        // In this case, the threshold of the next expansion is doubled
        else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                 oldCap >= DEFAULT_INITIAL_CAPACITY)
            // The capacity expansion threshold is also doubled
            newThr = oldThr << 1; // double threshold
    }
    // oldCap == 0 [description hash table null in hashMap] 
    // 1.new HashMap(initCap,loadFactor)
    // 2.new HashMap(intiCap)
    // 3.new HashMap(map) and the Map has data
    else if (oldThr > 0) // initial capacity was placed in threshold
        newCap = oldThr;
    
    // oldCap == 0 && oldThr == 0
    // new HashMap(); When
    else {               // zero initial threshold signifies using defaults
        newCap = DEFAULT_INITIAL_CAPACITY; // 16
        newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY); //12
    }
    // When the newThr is 0, calculate a newThr through newCap and loadFactor
    if (newThr == 0) {
        float ft = (float)newCap * loadFactor;
        newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                  (int)ft : Integer.MAX_VALUE);
    }
    // The update threshold is the calculated newThr
    threshold = newThr;
    @SuppressWarnings({"rawtypes","unchecked"})
    // Create a large array
    Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
    // Then update the reference of table
    table = newTab;
    // oldTab is not null, which means that the table of hashMap is not null before this expansion
    if (oldTab != null) {
        // Iterate one location at a time
        for (int j = 0; j < oldCap; ++j) {
            // e: Current node
            Node<K,V> e;
            // Iteration bucket node. If the node is not empty, it needs to be calculated
            // However, the specific formula of the data in the bucket (single data, linked list, tree) is uncertain and needs to be judged
            if ((e = oldTab[j]) != null) {
                // Empty the original array data and wait for GC to recycle. The original data already exists e inside
                oldTab[j] = null;
                // Descriptive single element data,
                if (e.next == null)
                    // Directly calculate the hash value and put it in
                    newTab[e.hash & (newCap - 1)] = e;
                // If it has been trealized
                else if (e instanceof TreeNode)
                    // Explain in the red and black tree section
                    ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                // If it's a linked list
                else { // preserve order
                    // The bucket position has formed a linked list
                    
                    // Low linked list: the subscript position of the array stored after capacity expansion, which is consistent with the subscript position of the current array
                    Node<K,V> loHead = null, loTail = null;
                  	// High order linked list: stored in the subscript position of the array after capacity expansion,
                    // Subscript position of current array + length of array before capacity expansion
                    Node<K,V> hiHead = null, hiTail = null;
                    
                    // An element of the current linked list
                    Node<K,V> next;
                    do {
                        next = e.next;
                        // hash -> .... 1 1111 
                        // hash -> .... 0 1111 
                        // 0b 1 0000
                        if ((e.hash & oldCap) == 0) {
                            if (loTail == null)
                                loHead = e;
                            else
                                loTail.next = e;
                            loTail = e;
                        }
                        else {
                            if (hiTail == null)
                                hiHead = e;
                            else
                                hiTail.next = e;
                            hiTail = e;
                        }
                    } while ((e = next) != null);
                    // Low linked list has data
                    if (loTail != null) {
                        loTail.next = null;
                        newTab[j] = loHead;
                    }
                    // High linked list has data
                    if (hiTail != null) {
                        hiTail.next = null;
                        newTab[j + oldCap] = hiHead;
                    }
                }
            }
        }
    }
    // When finished, return to the new
    return newTab;
}

Analysis of HashMap get method

// Get a method
public V get(Object key) {
    Node<K,V> e;
    // First call hash(key) to calculate hash value, then call getNode method.
    return (e = getNode(hash(key), key)) == null ? null : e.value;
}

// getNode method
final Node<K,V> getNode(int hash, Object key) {
    // tab: hash table referencing the current hashMap
    // first: header element in bucket
    // e: Temporary node element
   	// n: table array element
    Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
    // First, judge that the table is not empty and the length is not 0, and the first part is null
    if ((tab = table) != null && (n = tab.length) > 0 &&
        (first = tab[(n - 1) & hash]) != null) {
        // Judge the first first. If the hash values are equal and the key s are equal, return the current node
        if (first.hash == hash && // always check first node
            ((k = first.key) == key || (key != null && key.equals(k))))
            return first;
        // If the next of first is not null
        if ((e = first.next) != null) {
            // If it is a tree, the lookup method of the tree is called
            if (first instanceof TreeNode)
                return ((TreeNode<K,V>)first).getTreeNode(hash, key);
            // If it is a linked list, it will be judged circularly
            do {
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    return e;
            } while ((e = e.next) != null);
        }
    }
    // If none, null is returned
    return null;
}

Analysis of HashMap remove method

// Method of removing elements
public V remove(Object key) {
    Node<K,V> e;
    // Call the hash method to get the hash value, then call removeNode.
    return (e = removeNode(hash(key), key, null, false, true)) == null ?
        null : e.value;
}

// Core method removeNode
// Hash: hash value
// key: key
// Value: value if matchValue, the matching value will be. Otherwise, it will be ignored
// matchValue: if true, it will be deleted only when the values are equal
// movable: if the false node is deleted, other nodes will not be moved
final Node<K,V> removeNode(int hash, Object key, Object value,
                           boolean matchValue, boolean movable) {
    // tab: hash table referencing the current hashMap
    // p: Current node element
    // n: Indicates the length of the hash table array
    // index: indicates the addressing result
    Node<K,V>[] tab; Node<K,V> p; int n, index;
    
    // Judge whether the table is empty, whether the length is 0, and the corresponding hash value exists in the array before continuing to go down
    if ((tab = table) != null && (n = tab.length) > 0 &&
        (p = tab[index = (n - 1) & hash]) != null) {
        // It indicates that the bucket bit is generated by data and needs to be searched and deleted
        // node: found results
        // e: Next element of the current Node
        Node<K,V> node = null, e; K k; V v;
        // Judge whether the header element is the element to be deleted. If so, put it in node
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
            node = p;
        // The first one in the bucket is not
        else if ((e = p.next) != null) {
            // Tree structure
            if (p instanceof TreeNode)
                // Result of call tree
                node = ((TreeNode<K,V>)p).getTreeNode(hash, key);
            else {
                // The linked list structure is obtained by circular traversal
                do {
                    if (e.hash == hash &&
                        ((k = e.key) == key ||
                         (key != null && key.equals(k)))) {
                        node = e;
                        break;
                    }
                    p = e;
                } while ((e = e.next) != null);
            }
        }
        // Judge whether the target node to be deleted is obtained
        if (node != null && (!matchValue || (v = node.value) == value ||
                             (value != null && value.equals(v)))) {
            // If it is a tree node, call the delete operation of the tree
            if (node instanceof TreeNode)
                ((TreeNode<K,V>)node).removeTreeNode(this, tab, movable);
            // If node = p, it is the first data
            else if (node == p)
                // Update the address to the next data and put it into the bucket
                tab[index] = node.next;
            else
                // If node is not equal to p, it will directly point to the address of the next element in the linked list
                p.next = node.next;
            // Increase in modification times
            ++modCount;
            // Size minus 1
            --size;
            afterNodeRemoval(node);
            // Return deleted node
            return node;
        }
    }
    // If none is executed, null is returned
    return null;
}

Analysis of HashMap replace method

// Replace by k and v
@Override
public V replace(K key, V value) {
    Node<K,V> e;
    if ((e = getNode(hash(key), key)) != null) {
        V oldValue = e.value;
        e.value = value;
        afterNodeAccess(e);
        return oldValue;
    }
    return null;
}

// Replace with k oldValue newValue 
@Override
public boolean replace(K key, V oldValue, V newValue) {
    Node<K,V> e; V v;
    if ((e = getNode(hash(key), key)) != null &&
        ((v = e.value) == oldValue || (v != null && v.equals(oldValue)))) {
        e.value = newValue;
        afterNodeAccess(e);
        return true;
    }
    return false;
}

5. HashMap summary

  • The default capacity of HashMap is 16 (1 < < 4). Each time it exceeds the threshold, it will be automatically expanded by twice the size, so the capacity is always 2^N power. In addition, the underlying table array is delayed initialization, which is initialized only after the key value pair is added for the first time.

  • The default load factor of HashMap is 0.75. If we know the size of HashMap, we need to set the capacity and load factor correctly.

  • When each slot of HashMap meets the following two conditions, it can be transformed into a red black tree to avoid that when the slot is a linked list data structure, the linked list is too long, resulting in too slow search performance.

    1. Condition 1: the table array of HashMap is greater than or equal to 64.
    2. Condition 2: when the length of the slot linked list is greater than or equal to 8. The reason for choosing 8 as the threshold is that the probability (Poisson distribution) is less than one in ten million.
  • When the number of nodes of the red black tree in the slot is less than or equal to 6, it will degenerate back to the linked list.

  • The average time complexity of finding and adding key value pairs in HashMap is O(1).

  • For the node whose slot is a linked list, the average time complexity is O(k). Where k is the length of the linked list.

  • For the node whose slot is red black tree, the average time complexity is O(logk). Where k is the number of red black tree nodes.

Topics: Java Interview source code analysis HashMap