JDK source code learning 06 currenthashmap analysis

Posted by nimzie on Fri, 04 Mar 2022 00:53:32 +0100

JDK source code learning 06 currenthashmap analysis

The principle of CurrentHashMap is very complex. You can only record what you understand. As we all know, the thread safety implementation principle of CurrentHashMap is Synchronized+CAS. Now let's take a look.

Brief reading of notes

* Overview:
* The main design goal of this hash table is to maintain concurrent readability (usually get()Methods, also including iterators and related methods),
* At the same time, minimize update contention. The secondary goal is to keep space consumption in line with java.util.HashMap Same or better,
* It also supports the high initial insertion rate of many threads into empty tables. This mapping typically acts as a hash table for boxing (buckets). Each key value mapping is saved in a node.
* Most nodes are instances of basic node classes, with hashes, keys, values, and next fields. However, there are various subclasses: TreeNode Arranged in the balance tree,
* Not in the list. TreeBins have TreeNode The root of the collection. During resizing, ForwardingNodes Put it on the top of the dustbin.
* stay computeIfAbsent And related methods, ReservationNode Use as a placeholder.
* TreeBin,ForwardingNode and ReservationNode Types do not contain regular user keys, values or hashes, and are easy to distinguish in the process of search, etc,
* Because they have negative hash fields and empty key and value fields.

Hash function spread

It is worth noting that in CurrentHashMap, the hash of a node has three values

  1. Positive number: indicates that this is a normal node, a normal bucket node (the first node of the linked list) or a linked list node
  2. -1: Indicates that the barrel is expanding
  3. Other negative numbers: indicates that this is a tree node
    // In the hash function of hashmap, the first bit is forced to be 0
    static final int spread(int h) {
        return (h ^ (h >>> 16)) & HASH_BITS;
    }

static const

Other constants are similar to HashMap, so we won't look at them. Here we mainly look at the core fields.

  • If the element is being expanded and migrated, the hash value of the bucket head node is MOVED
  • If the element is a tree, the hash value of the bucket head node is TREEBIN
  • HASH_BITS is a 32-bit int, only the first bit is 0, and other binary bits are 1. You can only take the last 31 bits of the hash value (why, because hash is used not only as hash, but also as identification bit)
    static final int MOVED     = -1; // hash for forwarding nodes
    static final int TREEBIN   = -2; // hash for roots of trees
    static final int RESERVED  = -3; // hash for transient reservations
    static final int HASH_BITS = 0x7fffffff; // usable bits of normal node hash

    // In the hash function of hashmap, the first bit is forced to be 0
    static final int spread(int h) {
        return (h ^ (h >>> 16)) & HASH_BITS;
    }

Core data structure

Node type introduction
There will be a little more details later

Node              Ordinary node 
TreeNode          Tree node
TreeBins          A collection of tree nodes, including TreeNode Type root node  
ReservationNode   Use as placeholder
ForwardingNode    Used during capacity expansion

volatile modifies val and next to ensure the visibility of multithreading (val modifies to prevent dirty data, and next modifies to prevent concurrency problems)
sizeCtl is the core, which is very difficult to understand

    static class Node<K,V> implements Map.Entry<K,V> {
        final int hash;
        final K key;
        volatile V val;
        volatile Node<K,V> next;
    }
    // Data bucket table
    transient volatile Node<K,V>[] table;
    // Use the next table; Non null only when resizing
    private transient volatile Node<K,V>[] nextTable;
    // The basic counter value is mainly used when there is no contention. It is also used as a backup during the table initialization competition. Update via CAS.
    private transient volatile long baseCount;
    /**
     * Table initialization and sizing control. If negative, the table will be initialized or resized: - 1 for initialization,
     * Otherwise - (1 + number of resizing threads active). Otherwise, when the table is null, keep the initial table size to be used when creating,
     * Or it defaults to 0. After initialization, keep the count value of the next element to resize the table.
     */
    private transient volatile int sizeCtl;
	// Next node index to be moved during capacity expansion
	private transient volatile int transferIndex;

Core function

The core functions of CAS can be used according to their names. The underlying functions are implemented with operating system instructions cmpxchg. On the basis of implementing CAS, the semantics are the same as volatile (learn from JUC)

    static final <K,V> Node<K,V> tabAt(Node<K,V>[] tab, int i) {
        return (Node<K,V>)U.getObjectVolatile(tab, ((long)i << ASHIFT) + ABASE);
    }
    static final <K,V> boolean casTabAt(Node<K,V>[] tab, int i,
                                        Node<K,V> c, Node<K,V> v) {
        return U.compareAndSwapObject(tab, ((long)i << ASHIFT) + ABASE, c, v);
    }
    static final <K,V> void setTabAt(Node<K,V>[] tab, int i, Node<K,V> v) {
        U.putObjectVolatile(tab, ((long)i << ASHIFT) + ABASE, v);
    }

Let's start with a simple look

1. get function

The get function is very simple. The only thing to note is that EH < 0. At this time, the bucket head node is either a tree node or in the capacity expansion state. What is more noteworthy is the find method invoked in this judgment, and in the case of polymorphism, there will be various kinds of Sao operation.

    public V get(Object key) {
        Node<K,V>[] tab; Node<K,V> e, p; int n, eh; K ek;
        int h = spread(key.hashCode());
        if ((tab = table) != null && (n = tab.length) > 0 &&
            (e = tabAt(tab, (n - 1) & h)) != null) {
            // If the node found in the table is the same as the hash value of the key
            if ((eh = e.hash) == h) {
                // The key addresses are the same or the keys are the same
                if ((ek = e.key) == key || (ek != null && key.equals(ek)))
                    return e.val;
            }
            // EH < 0 indicates that TreeNode may be in the process of capacity expansion
            else if (eh < 0)
                // This find method belongs to the method of node < K, V >, which can be overridden by subclasses  
                // There will be different types of nodes calling different find methods, which is very clever
                return (p = e.find(h, key)) != null ? p.val : null;

            // EH > = 0 indicates that the bucket is filled with a normal linked list and can be circulated directly
            while ((e = e.next) != null) {
                if (e.hash == h &&
                    ((ek = e.key) == key || (ek != null && key.equals(ek))))
                    return e.val;
            }
        }
        return null;
    }

We can see that there are four very important subclasses of find methods

  • ForwardingNode indicates the node being expanded
  • I don't know the ReservationNode. This is a place holder node used in computeIfAbsent and compute. The consciousness is: the placeholder node used for computeIfAbsent and compute
  • The tree bin data structure stores the root node of the tree, volatile treenode < K, V > first; So TreeBin is just a description of the structure of red and black trees
  • TreeNode can only be the description of each node in the red black tree

2. put function

Logically, the put function is not difficult. It is unclear only about the helpTransfer function, but the most difficult is this function, which contains the core concurrency idea of CurrentHashMap. And before looking at this function, you should first look at the resize function

    /** Implementation for put and putIfAbsent */
    final V putVal(K key, V value, boolean onlyIfAbsent) {
        // We can see that CurrentHashMap does not support null keys and null values
        if (key == null || value == null) throw new NullPointerException();
        int hash = spread(key.hashCode());
        int binCount = 0;
        // This loop is nice classic failure retry
        for (Node<K,V>[] tab = table;;) {
            Node<K,V> f; int n, i, fh;
            if (tab == null || (n = tab.length) == 0)
                tab = initTable();
            // If the bucket in the table is empty
            else if ((f = tabAt(tab, i = (n - 1) & hash)) == null) {
                // Insert elements through cas
                if (casTabAt(tab, i, null,
                             new Node<K,V>(hash, key, value, null)))
                    break;                   // no lock when adding to empty bin
            }
            // At this time, f points to a bucket of the table
            // If the hash value is - 1, it means that the capacity is being expanded. This thread will join in to help expand the capacity. It feels like openmp
            else if ((fh = f.hash) == MOVED)
                tab = helpTransfer(tab, f);
            else {
                // This judgment indicates that it is either a linked list or a collection of tree nodes TreeBin - treenodes
                //(but it's not locked yet. It may change before locking)
                // If multiple threads come here at the same time and want to put, they will be blocked

                // If the slot in the table is not empty and has no expansion
                V oldVal = null;
                // Lock current slot
                synchronized (f) {
                    // What's the use of this judgment== Used to determine whether references are equal
                    // If only one bucket node is deleted between the above assignment and the current locking, this situation can be found here
                    // But the data of put was lost soon?
                    // This is because I didn't look carefully. It turns out that the whole code block is in an endless loop. If put fails, try again

                    if (tabAt(tab, i) == f) {
                        // Linked list structure has nothing to say
                        if (fh >= 0) {
                            binCount = 1;
                            for (Node<K,V> e = f;; ++binCount) {
                                K ek;
                                if (e.hash == hash &&
                                    ((ek = e.key) == key ||
                                     (ek != null && key.equals(ek)))) {
                                    oldVal = e.val;
                                    if (!onlyIfAbsent)
                                        e.val = value;
                                    break;
                                }
                                Node<K,V> pred = e;
                                if ((e = e.next) == null) {
                                    pred.next = new Node<K,V>(hash, key,
                                                              value, null);
                                    break;
                                }
                            }
                        }
                        // If it is a tree node
                        else if (f instanceof TreeBin) {
                            Node<K,V> p;
                            binCount = 2;
                            if ((p = ((TreeBin<K,V>)f).putTreeVal(hash, key,
                                                           value)) != null) {
                                oldVal = p.val;
                                if (!onlyIfAbsent)
                                    p.val = value;
                            }
                        }
                    }
                }
                if (binCount != 0) {
                    // binCount = 1; There are actually two nodes when it comes out
                    // Therefore, the following is that the number of nodes is greater than or equal to 9 (the number of nodes is greater than 8). The internal will judge that the length of the table needs to be greater than or equal to 64
                    if (binCount >= TREEIFY_THRESHOLD)
                        treeifyBin(tab, i);
                    if (oldVal != null)
                        return oldVal;
                    break;
                }
            }
        }
        addCount(1L, binCount);
        return null;
    }

3. Tryprevize function

The most complex transfer function will be called in the capacity expansion function

 /**
  * Tries to presize table to accommodate the given number of elements.
  *
  * @param size number of elements (doesn't need to be perfectly accurate)
  */
 private final void tryPresize(int size) {
     int c = (size >= (MAXIMUM_CAPACITY >>> 1)) ? MAXIMUM_CAPACITY :
         tableSizeFor(size + (size >>> 1) + 1);
     int sc;
     // Sizecl note:
     //   Table initialization and sizing control. If negative, the table will be initialized or resized:
     // -1 for initialization, otherwise - (1 + number of resizing threads active)
     //   Otherwise, when the table is null, the initial table size to be used during creation is retained, or the default is 0.
     // After initialization, keep the count value of the next element to resize the table.

     // Keep trying to expand the capacity until it enters the initialization or expansion state
     //   If it is initialization, first enter the first judgment to obtain the threshold, and then enter the third judgment in the second cycle to enter the capacity expansion state
     //   In case of capacity expansion, enter the third judgment directly
     while ((sc = sizeCtl) >= 0) {
         Node<K,V>[] tab = table; int n;
         // Shouldn't lazy loading be an initTable function?
         if (tab == null || (n = tab.length) == 0) {
             n = (sc > c) ? sc : c;
             if (U.compareAndSwapInt(this, SIZECTL, sc, -1)) {
                 try {
                     if (table == tab) {
                         @SuppressWarnings("unchecked")
                         Node<K,V>[] nt = (Node<K,V>[])new Node<?,?>[n];
                         table = nt;
                         // sc=0.75*n calculate the next expansion threshold
                         sc = n - (n >>> 2);
                     }
                 } finally {
                     sizeCtl = sc;
                 }
             }
         }
         // The second judgment C < = SC is not easy to understand. It means that the power of 2 is less than or equal to sizeCtl
         // -Assuming that the original capacity is 100, the sizeCtl is 75. At this time, many elements are remove d  
         // When you put the element again, it will appear that the power of 2 is less than or equal to sizeCtl
         else if (c <= sc || n >= MAXIMUM_CAPACITY)
             break;
         else if (tab == table) {
             // Look at the name, it feels like returning a random number
             int rs = resizeStamp(n);
             // Expanding capacity
             if (sc < 0) {
                 Node<K,V>[] nt;
                 if ((sc >>> RESIZE_STAMP_SHIFT) != rs || sc == rs + 1 ||
                     sc == rs + MAX_RESIZERS || (nt = nextTable) == null ||
                     transferIndex <= 0)
                     break;
                 if (U.compareAndSwapInt(this, SIZECTL, sc, sc + 1))
                     transfer(tab, nt);
             }
             else if (U.compareAndSwapInt(this, SIZECTL, sc,
                                          (rs << RESIZE_STAMP_SHIFT) + 2))
                 transfer(tab, null);
         }
     }
 }

4. helpTransfer function

Visually, the transfer function segments the table, and then this method will help expand the capacity and move the elements in the bucket to the new table. This method page calls the transfer function and will not return from this method until the movement is completed.

    /**
     * Helps transfer if a resize is in progress.
     * If you are resizing, it is helpful to transfer
     */
    final Node<K,V>[] helpTransfer(Node<K,V>[] tab, Node<K,V> f) {
        Node<K,V>[] nextTab; int sc;
        if (tab != null && (f instanceof ForwardingNode) &&
            (nextTab = ((ForwardingNode<K,V>)f).nextTable) != null) {
            int rs = resizeStamp(tab.length);
            while (nextTab == nextTable && table == tab &&
                   (sc = sizeCtl) < 0) {
                if ((sc >>> RESIZE_STAMP_SHIFT) != rs || sc == rs + 1 ||
                    sc == rs + MAX_RESIZERS || transferIndex <= 0)
                    break;
                if (U.compareAndSwapInt(this, SIZECTL, sc, sc + 1)) {
                    transfer(tab, nextTab);
                    break;
                }
            }
            return nextTab;
        }
        return table;
    }

5. transfer function

It's too complicated. Just analyze the previous part, that is, segment the CurrentHashMap

    /**
     * Moves and/or copies the nodes in each bin to new table. See
     * above for explanation.
     */
    private final void transfer(Node<K,V>[] tab, Node<K,V>[] nextTab) {
        int n = tab.length, stride;
        // If the number of CPUs is 1, the length of the whole table is n and divided into one segment, that is, a total of one segment
        // If the number of CPUs is greater than 1, the whole table is divided into n/(8*cpu) segments
        if ((stride = (NCPU > 1) ? (n >>> 3) / NCPU : n) < MIN_TRANSFER_STRIDE)
            stride = MIN_TRANSFER_STRIDE; // subdivide range
        if (nextTab == null) {            // initiating
            try {
                @SuppressWarnings("unchecked")
                Node<K,V>[] nt = (Node<K,V>[])new Node<?,?>[n << 1];
                nextTab = nt;
            } catch (Throwable ex) {      // try to cope with OOME
                sizeCtl = Integer.MAX_VALUE;
                return;
            }
            nextTable = nextTab;
            // Start from the last bucket and move to the new watch
            transferIndex = n;
        }
        int nextn = nextTab.length;
        ForwardingNode<K,V> fwd = new ForwardingNode<K,V>(nextTab);
        boolean advance = true;
        boolean finishing = false; // to ensure sweep before committing nextTab
        for (int i = 0, bound = 0;;) {
            Node<K,V> f; int fh;
            while (advance) {
                int nextIndex, nextBound;
                if (--i >= bound || finishing)
                    advance = false;
                else if ((nextIndex = transferIndex) <= 0) {
                    i = -1;
                    advance = false;
                }
                else if (U.compareAndSwapInt
                         (this, TRANSFERINDEX, nextIndex,
                          nextBound = (nextIndex > stride ?
                                       nextIndex - stride : 0))) {
                    bound = nextBound;
                    i = nextIndex - 1;
                    advance = false;
                }
            }
            if (i < 0 || i >= n || i + n >= nextn) {
                int sc;
                if (finishing) {
                    nextTable = null;
                    table = nextTab;
                    sizeCtl = (n << 1) - (n >>> 1);
                    return;
                }
                if (U.compareAndSwapInt(this, SIZECTL, sc = sizeCtl, sc - 1)) {
                    if ((sc - 2) != resizeStamp(n) << RESIZE_STAMP_SHIFT)
                        return;
                    finishing = advance = true;
                    i = n; // recheck before commit
                }
            }
            else if ((f = tabAt(tab, i)) == null)
                advance = casTabAt(tab, i, null, fwd);
            else if ((fh = f.hash) == MOVED)
                advance = true; // already processed
            else {
                synchronized (f) {
                    if (tabAt(tab, i) == f) {
                        Node<K,V> ln, hn;
                        if (fh >= 0) {
                            int runBit = fh & n;
                            Node<K,V> lastRun = f;
                            for (Node<K,V> p = f.next; p != null; p = p.next) {
                                int b = p.hash & n;
                                if (b != runBit) {
                                    runBit = b;
                                    lastRun = p;
                                }
                            }
                            if (runBit == 0) {
                                ln = lastRun;
                                hn = null;
                            }
                            else {
                                hn = lastRun;
                                ln = null;
                            }
                            for (Node<K,V> p = f; p != lastRun; p = p.next) {
                                int ph = p.hash; K pk = p.key; V pv = p.val;
                                if ((ph & n) == 0)
                                    ln = new Node<K,V>(ph, pk, pv, ln);
                                else
                                    hn = new Node<K,V>(ph, pk, pv, hn);
                            }
                            setTabAt(nextTab, i, ln);
                            setTabAt(nextTab, i + n, hn);
                            setTabAt(tab, i, fwd);
                            advance = true;
                        }
                        else if (f instanceof TreeBin) {
                            TreeBin<K,V> t = (TreeBin<K,V>)f;
                            TreeNode<K,V> lo = null, loTail = null;
                            TreeNode<K,V> hi = null, hiTail = null;
                            int lc = 0, hc = 0;
                            for (Node<K,V> e = t.first; e != null; e = e.next) {
                                int h = e.hash;
                                TreeNode<K,V> p = new TreeNode<K,V>
                                    (h, e.key, e.val, null, null);
                                if ((h & n) == 0) {
                                    if ((p.prev = loTail) == null)
                                        lo = p;
                                    else
                                        loTail.next = p;
                                    loTail = p;
                                    ++lc;
                                }
                                else {
                                    if ((p.prev = hiTail) == null)
                                        hi = p;
                                    else
                                        hiTail.next = p;
                                    hiTail = p;
                                    ++hc;
                                }
                            }
                            ln = (lc <= UNTREEIFY_THRESHOLD) ? untreeify(lo) :
                                (hc != 0) ? new TreeBin<K,V>(lo) : t;
                            hn = (hc <= UNTREEIFY_THRESHOLD) ? untreeify(hi) :
                                (lc != 0) ? new TreeBin<K,V>(hi) : t;
                            setTabAt(nextTab, i, ln);
                            setTabAt(nextTab, i + n, hn);
                            setTabAt(tab, i, fwd);
                            advance = true;
                        }
                    }
                }
            }
        }
    }

Topics: Java JDK data structure Multithreading HashMap