Explain the principle of ConcurrentHashMap in detail

Posted by biggus on Tue, 18 Jan 2022 09:54:37 +0100

1.1 problems

  1. First, public variables such as size are not atomic.
  2. A ring linked list will be generated during capacity expansion, resulting in an infinite loop leading to OOM when the query key is hashed to the bucket where the ring linked list is located and the key does not exist.

1.2 capacity expansion principle:

1) Capacity expansion

Create a new empty Entry array, which is twice the length of the original array.

2)rehash

Traverse the original Entry array and re Hash all entries to the new array. Why re Hash? Because after the length is expanded, the rules of Hash also change.

1.3 causes of problems:

Suppose A HashMap has reached the critical point of Resize. At this time, there are two threads A and B to Put HashMap at the same time:

When the Resize condition is reached, the two threads respectively perform the first step of Rezie, that is, capacity expansion. If thread B traverses the Entry3 object and just executes this line of code in the red box, the thread will be suspended. For thread B:

e = Entry3  
next = Entry2  

At this time, thread A carries out rehash unimpeded. When rehash is completed, the results are as follows (e and next in the figure represent the two references of thread B):

Up to this point, there seems to be nothing wrong. Next, thread B resumes and continues to execute its own ReHash. The status of thread B is:

Supplementary knowledge points: thread 1 and thread 2 have their own independent working memory during operation and do not interfere with each other. Thread 1 should not have obtained the result of thread 2; However, sun. Is called during the hash operation misc. Hashing. stringHash32 Method, force to read the main memory. After the result of thread 2 is written into the main memory, thread 1 will get the changed value

e = Entry3  
next = Entry2

When the above line is executed, it is obvious that i = 3, because the hash result of thread A for Entry3 is also 3.

We continue to these two lines. Entry3 is placed in the array subscript 3 of thread B, and e points to Entry2. The directions of E and next are as follows:

e = Entry2  
next = Entry

Then there is a new round of loop, which is executed to the code line in the red box

e = Entry2  
next = Entry3  

Next, execute the following three lines and insert Entry2 into the header node of the array of thread B by header insertion:

At the beginning of the third cycle, execute the code to the red box:

e = Entry3  
next = Entry3.next = null  

Finally, when we execute the following line, the time to witness miracles has come:

newTablei = Entry2  
e = Entry3  
Entry2.next = Entry3  
Entry3.next = Entry2  

There is a ring in the linked list!

At this time, the problem has not been directly generated. When you call Get to find a nonexistent Key, and the Hash result of this Key is exactly equal to 3, because position 3 has a ring linked list, the program will enter an endless loop!

2, ConcurrentHashMap(1.7)

2.1 concept

Concurrent HashMap is a thread safe and efficient version of HashMap. In the default ideal state, ConcurrentHashMap can support 16 threads to perform concurrent write operations and read operations of any number of threads.

When accessing the ConcurrentHashMap, first locate the specific segment, and then access the entire ConcurrentHashMap by accessing the specific segment. In particular, both the read and write operations of ConcurrentHashMap have high performance: there is no need to lock during the read operation, but only the operated segment is locked through the lock segmentation technology during the write operation without affecting the client's access to other segments.

2.2 advantages

The efficient concurrency mechanism of ConcurrentHashMap is guaranteed through the following three aspects:

  • The lock segmentation technology is used to ensure the write operation in the concurrent environment;
  • Through the invariance of HashEntry, memory visibility of Volatile variables and lock reread mechanism, efficient and safe read operations are ensured;
  • The security of cross section operation is controlled by two schemes: no locking and locking.

2.3 structure

ConcurrentHashMap is an array of segments, and a Segment instance is a small hash table. Since the Segment class inherits from the ReentrantLock class, the Segment object can act as a lock

The Segment object can guard several buckets of the entire ConcurrentHashMap, and each bucket consists of several hashentries

Object linked list. By using segments to divide the ConcurrentHashMap into different parts, ConcurrentHashMap can use different locks to control the modification of different parts of the hash table, so as to allow multiple modification operations to be performed concurrently,

This is the core connotation of concurrent HashMap lock segmentation technology. Further, if the entire ConcurrentHashMap is regarded as a parent hash table, each Segment can be regarded as a child hash table, as shown in the following figure:

image

Note that assuming that the ConcurrentHashMap is divided into 2^n segments with 2^m buckets in each segment, the segment positioning method is to match the high N bits of the hash value of the key with (2^n)-

1) With each other. After locating a segment, match the low M bit of the hash value of the key with (2^m - 1) to locate the specific bucket bit.

2.4 constructor

**①ConcurrentHashMap(int initialCapacity, float loadFactor, int
concurrencyLevel)**
    public ConcurrentHashMap(int initialCapacity,
                             float loadFactor, int concurrencyLevel) {
        if (!(loadFactor > 0) || initialCapacity < 0 || concurrencyLevel <= 0)
            throw new IllegalArgumentException();
        if (concurrencyLevel > MAX_SEGMENTS)              
            concurrencyLevel = MAX_SEGMENTS;
        // Find power-of-two sizes best matching arguments
        int sshift = 0;            // Size lg(ssize) 
        int ssize = 1;            // Number of segments, size of segments array (power of 2)
        while (ssize < concurrencyLevel) {
            ++sshift;
            ssize <<= 1;
        }
        segmentShift = 32 - sshift;      // For positioning segments
        segmentMask = ssize - 1;      // For positioning segments
        this.segments = Segment.newArray(ssize);   // Create segments array
        if (initialCapacity > MAXIMUM_CAPACITY)
            initialCapacity = MAXIMUM_CAPACITY;
        int c = initialCapacity / ssize;    // Total barrels / total segments
        if (c * ssize < initialCapacity)
            ++c;
        int cap = 1;     // Number of buckets per segment (power of 2)
        while (cap < c)
            cap <<= 1;
        for (int i = 0; i < this.segments.length; ++i)      // Initializing segments array
            this.segments[i] = new Segment<K,V>(cap, loadFactor);
    }

②ConcurrentHashMap(int initialCapacity, float loadFactor)

    public ConcurrentHashMap(int initialCapacity, float loadFactor) {
        this(initialCapacity, loadFactor, DEFAULT_CONCURRENCY_LEVEL);  // The default concurrency level is 16
    }

③ConcurrentHashMap(int initialCapacity)

    public ConcurrentHashMap(int initialCapacity) {
        this(initialCapacity, DEFAULT_LOAD_FACTOR, DEFAULT_CONCURRENCY_LEVEL);
    }

④ConcurrentHashMap()

    public ConcurrentHashMap() {
        this(DEFAULT_INITIAL_CAPACITY, DEFAULT_LOAD_FACTOR, DEFAULT_CONCURRENCY_LEVEL);
    }

⑤ConcurrentHashMap(Map <? extends K, ? extends V> m)

    public ConcurrentHashMap(Map<? extends K, ? extends V> m) {
        this(Math.max((int) (m.size() / DEFAULT_LOAD_FACTOR) + 1,
                      DEFAULT_INITIAL_CAPACITY),
             DEFAULT_LOAD_FACTOR, DEFAULT_CONCURRENCY_LEVEL);
        putAll(m);
    }

2.5 concurrent access

In ConcurrentHashMap, when a thread reads the mapping table, it can generally be completed without locking. Only operations that make structural modifications to the container (such as put operation, remove operation, etc.) need locking.

2.5.1 concurrent write operation put(key, vlaue)

    public V put(K key, V value) {
        if (value == null)
            throw new NullPointerException();
        int hash = hash(key.hashCode());
        return segmentFor(hash).put(key, hash, value, false);
    }

When we put a Key/Value pair into the ConcurrentHashMap, we will first obtain the hash value of the Key and hash it again, and then locate the segment that should be inserted into the record according to the final hash value. The source code of the segmentFor() method for locating the segment is as follows:

    final Segment<K,V> segmentFor(int hash) {
        return segments[(hash >>> segmentShift) & segmentMask];
    }

The segmentFor() method shifts the segmentShift bit to the right unsigned according to the incoming hash value, and then performs an and operation with the segmentMask to locate a specific Segment. Here, assuming that the number of segments (the length of the segments array) is to the nth power of 2 (the number of segments is always a multiple of 2, see the implementation of the constructor for details), the value of segmentShift is 32-n (the number of bits of the hash value is 32), and the value of segmentMask is 2^n-1 (written in binary form is n ones). Further, we can draw the following conclusion: according to the high N bits of the hash value of the key, we can determine which Segment the element is in. Next, call the put() method of this Segment to insert the target Key/Value pair into the Segment. The source code of the put() method of the Segment is as follows:

    V put(K key, int hash, V value, boolean onlyIfAbsent) {
            lock();    // Lock
            try {
                int c = count;
                if (c++ > threshold) // ensure capacity
                    rehash();
                HashEntry<K,V>[] tab = table;    // table is the name of Volatile
                int index = hash & (tab.length - 1);    // Navigate to a specific bucket in the segment
                HashEntry<K,V> first = tab[index];   // first points to the header of the linked list in the bucket
                HashEntry<K,V> e = first;
                // Check whether there are nodes with the same key in the bucket
                while (e != null && (e.hash != hash || !key.equals(e.key)))  
                    e = e.next;
                V oldValue;
                if (e != null) {        // Nodes with the same key exist in this bucket
                    oldValue = e.value;
                    if (!onlyIfAbsent)
                        e.value = value;        // Update value
                }else {         // Nodes with the same key do not exist in this bucket
                    oldValue = null;
                    ++modCount;     // Structural modification, modCount plus 1
                    tab[index] = new HashEntry<K,V>(key, hash, first, value);  // Create a HashEntry and chain it to the header
                    count = c;      //Write volatile, the update of count value must be put in the last step (volatile variable)
                }
                return oldValue;    // Return the old value (null if there is no node with the same key in the bucket)
            } finally {
                unlock();      // Unlock in the finally clause
            }
        }

From the source code, we can first know that the put operation of ConcurrentHashMap on Segment is completed by locking. In Section 2, we already know that Segment is a subclass of ReentrantLock, so Segment itself is a reentrant Lock, so we can directly call its inherited lock() method and unlock() method to Lock / unlock the code. It should be noted that the locking operation here is for a specific Segment, and the Segment is locked instead of the whole ConcurrentHashMap. Because the operation of inserting key / value pairs is only completed in a bucket contained in this Segment, there is no need to Lock the whole ConcurrentHashMap. Therefore, the locking of another 15 segments by other write threads will not be blocked by the locking of this Segment by the current thread. therefore

Compared with HashTable and HashMap wrapped by synchronization wrapper, only one thread can perform read or write operations at a time

The performance of concurrent access has been qualitatively improved. Ideally, ConcurrentHashMap can support 16 threads to perform concurrent write operations (if the concurrency level is set to

16) , and any number of threads.

Before inserting a Key/Value pair into a Segment, first check whether this insertion will cause the number of elements in the Segment to exceed the threshold

, if yes, expand and re hash the Segment first, and then insert it. The re hashing operation does not include the table for the time being, which will be described in detail later. The operations in lines 8 and 9 are to locate the specific bucket in the Segment and determine the position of the head of the linked list. The while loop in line 12 is used to check whether there are nodes with the same key in the bucket. If so, the value value is directly updated; If it is not found, enter line 21 to generate a new HashEntry and link it to the header of the linked list in the bucket, and then update the value of count (since count is a volatile variable, the update of count value must be put in the last step).

So far, in addition to the re hashing operation, the put operation of ConcurrentHashMap has been introduced. In addition, in ConcurrentHashMap, modification operations also include putAll() and replace(). The putall () operation is to call the put method multiple times, and the implementation of the replace () operation is much simpler than the put() operation, which will not be repeated.

2.5.2 rehash()

As described above, before inserting a Key/Value pair using the put operation in the ConcurrentHashMap, first check whether this insertion will cause the number of nodes in the Segment to exceed the threshold. If so, first expand and re hash the Segment. In particular, it should be noted that the re hashing of ConcurrentHashMap is actually the re hashing of a Segment of ConcurrentHashMap, so the bucket bits contained in each Segment of ConcurrentHashMap are naturally different. The source code of rehash() operation for segments is as follows:

     void rehash() {
            HashEntry<K,V>[] oldTable = table;    // table before capacity expansion
            int oldCapacity = oldTable.length;
            if (oldCapacity >= MAXIMUM_CAPACITY)   // It has been expanded to the maximum capacity and returned directly
                return;
            // Create a new table with twice the original capacity
            HashEntry<K,V>[] newTable = HashEntry.newArray(oldCapacity<<1);   
            threshold = (int)(newTable.length * loadFactor);   // New threshold
            int sizeMask = newTable.length - 1;     // Used to position the bucket
            for (int i = 0; i < oldCapacity ; i++) {
                // We need to guarantee that any existing reads of old Map can
                //  proceed. So we cannot yet null out each bin.
                HashEntry<K,V> e = oldTable[i];  // Point to the linked list header of each bucket in the old table in turn
                if (e != null) {    // The linked list in the bucket of the old table is not empty
                    HashEntry<K,V> next = e.next;
                    int idx = e.hash & sizeMask;   // Rehash has been positioned to the new bucket
                    if (next == null)    //  There is only one node in the bucket of the old table
                        newTable[idx] = e;
                    else {    
                        // Reuse trailing consecutive sequence at same slot
                        HashEntry<K,V> lastRun = e;
                        int lastIdx = idx;
                        for (HashEntry<K,V> last = next; last != null; last = last.next) {
                            int k = last.hash & sizeMask;
                            // Find a child chain with the same k value. The tail node of the child chain must be the same as the tail node of the parent chain
                            if (k != lastIdx) {
                                lastIdx = k;
                                lastRun = last;
                            }
                        }
                        // JDK directly puts the child chain lastRun into the newTable[lastIdx] bucket
                        newTable[lastIdx] = lastRun;
                        // For the nodes before the sub chain, JDK will traverse one by one and copy them to the new bucket
                        for (HashEntry<K,V> p = e; p != lastRun; p = p.next) {
                            int k = p.hash & sizeMask;
                            HashEntry<K,V> n = newTable[k];
                            newTable[k] = new HashEntry<K,V>(p.key, p.hash,
                                                             n, p.value);
                        }
                    }
                }
            }
            table = newTable;   // Expansion completed
        }

Since the expansion is carried out according to the power of 2, the elements in the same bucket before expansion are now either in the bucket of the original serial number, or the original serial number plus a power of 2. According to the previous introduction of HashEntry in this article, we know that the link pointer next is final. Therefore, it seems that we can only copy each node in the HashEntry chain of this bucket to a new bucket (which means that we need to recreate each node), but in fact, JDK has optimized it to some extent. In theory, the HashEntry chain in the original bucket may have a sub chain, and the nodes on this sub chain will be re hashed to the same new bucket. In this way, as long as we get the head node of the sub chain, we can directly put the sub chain into the new bucket, so as to avoid unnecessary creation of some nodes and improve certain efficiency. Therefore, in order to improve efficiency, JDK will first find such a sub chain, and the tail node of this sub chain must be the same as the tail node of the original hash chain, so it only needs to put the head node of this sub chain into a new bucket, and the following string of sub nodes will be connected naturally. For the nodes before the sub chain head node, JDK will traverse one by one and copy them to the chain head of the new bucket (elements can only be inserted in the header). In particular, we note this Code:

for (HashEntry<K,V> last = next; last != null; last = last.next) {
    int k = last.hash & sizeMask;
    if (k != lastIdx) {
        lastIdx = k;
        lastRun = last;
    }
}
newTable[lastIdx] = lastRun;

In this code segment, the JDK directly puts the sub chain lastRun into the newTablelastIdx bucket. Won't this operation overwrite the original elements in the newTablelastIdx bucket? In fact, this situation is impossible, because there will be no node at the root of the bucket newTablelastIdx before the sub chain is added. This is also because the size of the table is expanded to the power of 2. Suppose the size of the original table is 2^k

Size. Now the size of the new table is 2^(k+1), and the way to locate the bucket is:

// sizeMask = newTable.length - 1, i.e. sizemask = 11 1, k+1 in total.
int idx = e.hash & sizeMask;

Therefore, the idx obtained in this way is actually the value of the lower k+1 bits of the hash value of the key, and the sizeMask of the original table is all binary of 1, but there are k bits in total, so the idx of the original table is the value of the lower k bits of the hash value of the key. Therefore, if the k+1 bit of the hashcode of the element is 0, the sequence number of the element in the new bucket is equal to that of the original bucket; If the value of bit k+1 is 1, the sequence number of the element in the new bucket is the sequence number of the original bucket plus 2^k. Therefore, JDK can directly put the sub chain lastRun into the newTablelastIdx bucket, because the new serial number in the newTable must be empty at this time.

2.5.3 read implementation get(Object key)

Similar to the put operation, when we query a Key value pair of a specified Key from the ConcurrentHashMap, we first locate the segment that should exist, and then delegate the query request to this segment for processing. The source code is as follows:

    public V get(Object key) {
        int hash = hash(key.hashCode());
        return segmentFor(hash).get(key, hash);
    }

We will then study the source code of get operation in Segment:

    V get(Object key, int hash) {
            if (count != 0) {            // Read volatile, first read the count variable
                HashEntry<K,V> e = getFirst(hash);   // Get the bucket chain header node
                while (e != null) {
                    if (e.hash == hash && key.equals(e.key)) {    // Find out whether the Key value pair of the specified Key exists in the chain
                        V v = e.value;
                        if (v != null)  // If the value field is not null, it is returned directly
                            return v;   
                        // If you read that the value field is null, it indicates that reordering has occurred. Lock and read again
                        return readValueUnderLock(e); // recheck
                    }
                    e = e.next;
                }
            }
            return null;  // If it does not exist, null is returned directly
        }

After understanding the put operation of ConcurrentHashMap, the above source code is well understood. However, there is a case that needs special attention, that is, there is a key value pair of the specified key in the chain and its corresponding value value is null. When analyzing the put operation of ConcurrentHashMap, we know that ConcurrentHashMap is different from HashMap. It allows neither the key value nor the value value to be null. However, how can there be a case where the key value pair exists and the value value of is null? The official explanation given by JDK is that the scenario of this situation is:

It is caused by the reordering of instructions during the initialization of HashEntry, that is, its reference is returned before the initialization of HashEntry is completed.

At this time, the solution given by JDK is to lock and reread. The source code is as follows:

        V readValueUnderLock(HashEntry<K,V> e) {
            lock();
            try {
                return e.value;
            } finally {
                unlock();
            }
        }

2.5.4 concurrent HashMap access summary

When accessing the ConcurrentHashMap, first locate the specific segment, and then access the entire ConcurrentHashMap by accessing the specific segment. In particular, both the read and write operations of ConcurrentHashMap have high performance: there is no need to lock during the read operation, but only the operated segment is locked through the lock segmentation technology during the write operation without affecting the client's access to other segments.

2.6 the mystery of reading operation without locking

The HashEntry object is almost immutable (only the Value can be changed), because the key, hash and next pointers in the HashEntry are final. This means that we cannot add nodes to the middle and tail of the linked list, nor delete nodes in the middle and tail of the linked list. This feature can ensure that when accessing a node, the links behind the node will not be changed. This feature can greatly reduce the complexity of processing linked lists.

At the same time, because the value field of the HashEntry class is declared to be Volatile, the Java memory model can ensure that the write of a write thread to the value field can be immediately seen by a subsequent read thread.

In addition, since null is not allowed as the key and value in ConcurrentHashMap, when the reader thread reads that the value of a HashEntry is null, it knows that a conflict has occurred

——In case of reordering, the value will be locked and re read. These features cooperate with each other so that the read thread can access correctly even in the unlocked state

ConcurrentHashMap.

In general, the secret that the concurrent HashMap read operation does not require locking lies in the following three points:

  • The invariance of hashentry object is used to reduce the need for locking in read operation; (final modifies key, hash and next to ensure that the original node will not change during concurrent modification)
  • Coordinate memory visibility between read and write threads with Volatile variables; (value and count)
  • If instruction reordering occurs during reading, lock and reread; (readValueUnderLock)

Expand the above one and two points:

1. The invariance of hashentry object is used to reduce the need for locking in read operation

The non structural modification operation only changes the value of the value field of a HashEntry. Since the write operation to the Volatile variable will be synchronized with the subsequent read operation to this variable, when a write thread modifies the value field of a HashEntry, the Java memory model can ensure that the read thread can read the updated value of this field. Therefore, the non structural modification of the linked list by the write thread can be seen by the subsequent unlocked read thread.

When you make a structural modification to the ConcurrentHashMap, it is essentially a structural modification to the linked list pointed to by a bucket. If it can be ensured that the structural modifications made by the write thread to the linked list do not affect the read thread's normal traversal of the linked list during the reading thread's traversal of a linked list, the concurrent HashMap can be accessed safely and concurrently between the read / write threads. In ConcurrentHashMap, structural modification operations include put operation, remove operation and clear operation. We will analyze these three operations respectively below:

  • The clear operation just empties all buckets in the ConcurrentHashMap. The linked list referenced by each bucket still exists, but the bucket no longer references these linked lists, and the structure of the linked list itself has not been modified. Therefore, the reading thread that is traversing a linked list can still perform the traversal of the linked list normally.
  • The details of the put operation have been introduced separately above. We know that if the put operation needs to insert a new node into the linked list, the new node will be inserted in the head of the linked list. At this time, the link of the original node in the linked list has not been modified. In other words, the operation of inserting a new key / value pair into the linked list will not affect the normal traversal of the linked list by the reading thread.

Let's analyze the remove operation. Let's take a look at the source code implementation of the remove operation first:

    public V remove(Object key) {
    int hash = hash(key.hashCode());
        return segmentFor(hash).remove(key, hash, null);
    }

Similarly, when deleting a key value pair in concurrent HashMap, you first need to locate a specific Segment and delegate the deletion operation to that Segment. The remove operation of Segment is as follows:

        V remove(Object key, int hash, Object value) {
            lock();     // Lock
            try {
                int c = count - 1;      
                HashEntry<K,V>[] tab = table;
                int index = hash & (tab.length - 1);        // Positioning barrel
                HashEntry<K,V> first = tab[index];
                HashEntry<K,V> e = first;
                while (e != null && (e.hash != hash || !key.equals(e.key)))  // Find key value pairs to be deleted
                    e = e.next;
                V oldValue = null;
                if (e != null) {    // find
                    V v = e.value;
                    if (value == null || value.equals(v)) {
                        oldValue = v;
                        // All entries following removed node can stay
                        // in list, but all preceding ones need to be
                        // cloned.
                        ++modCount;
                        // All nodes after the nodes to be deleted remain in the linked list as is
                        HashEntry<K,V> newFirst = e.next;
                        // All nodes before the nodes to be deleted are cloned into the new linked list
                        for (HashEntry<K,V> p = first; p != e; p = p.next)
                            newFirst = new HashEntry<K,V>(p.key, p.hash,newFirst, p.value); 
                        tab[index] = newFirst;   // The chain after deleting the specified node and reorganizing is put back into the bucket
                        count = c;      // Write Volatile, update Volatile variable count
                    }
                }
                return oldValue;
            } finally {
                unlock();          // finally clause unlock
            }
        }

The remove operation of Segment is similar to the get operation mentioned above. First, find the specific linked list according to the hash code, then traverse the linked list to find the node to be deleted, finally keep all nodes after the node to be deleted in the new linked list, and clone each node before the node to be deleted into the new linked list. Suppose the write thread performs the remove operation to delete the C node of the linked list, and another read thread is traversing the linked list at the same time

We can see that all nodes after deleting node C remain in the new linked list as they are; Each node before deleting node C is cloned into the new linked list (their link order in the new linked list is reversed). Therefore, when the remove operation is performed, the original linked list is not modified, that is, the read thread will not be affected by simultaneous execution

Interference of concurrent write threads of remove operation.

Based on the above analysis, we can know that whether the write thread makes structural or non structural modifications to a linked list, it will not affect the access of other concurrent read threads to the linked list.

2. Coordinating memory visibility between read and write threads with Volatile variables

Generally, due to memory visibility problems, the read thread may not be able to read the value written by the write thread in time when it is not synchronized correctly. Next, write thread M and read thread N are used to illustrate how concurrent HashMap coordinates memory visibility between read / write threads, as shown in the following figure:

image

Suppose that after thread M writes the volatile variable count, thread N reads the volatile variable. According to the program order rule in the happens before relation rule, A

Appens before B, C, appens before D. According to Volatile's law, B happens before

C. Combined with transitivity, we can get: a appens before in B; B appens-before C; C happens-before

D. That is, the structural changes made by the write thread M to the linked list are visible to the read thread N. Although thread N accesses the linked list without locking, the memory model of Java can ensure that as long as the writing thread M that has previously made structural modifications to the linked list writes the volatile variable count before exiting the writing method, the reading thread N can read the latest value of the volatile variable count.

In fact, ConcurrentHashMap is an array of segments, and each Segment has a volatile variable count to count the number of hashentries in the Segment. Moreover, in ConcurrentHashMap, all unlocked read methods will first read the count variable when entering the read method. For example, the get method mentioned in the previous section:

    V get(Object key, int hash) {
            if (count != 0) {            // Read volatile, first read the count variable
                HashEntry<K,V> e = getFirst(hash);   // Get the bucket chain header node
                while (e != null) {
                    if (e.hash == hash && key.equals(e.key)) {    // Find out whether the Key value pair of the specified Key exists in the chain
                        V v = e.value;
                        if (v != null)  // If the value field is not null, it is returned directly
                            return v;   
                        // If you read that the value field is null, it indicates that reordering has occurred. Lock and read again
                        return readValueUnderLock(e); // recheck
                    }
                    e = e.next;
                }
            }
            return null;  // If it does not exist, null is returned directly
        }

3. Summary

In ConcurrentHashMap, all write methods (put, remove and clear) will write the count variable before exiting the write method after making structural modifications to the linked list; All unlocked read operations (get, contains and containsKey) will first read the count variable in the read method. according to

Java memory model, for the same volatile

Variable write / read operation can ensure that the value written by the write thread can be "seen" by the subsequent unlocked read thread. This feature is combined with the invariance of the HashEntry object described earlier, so that the reading thread in concurrent HashMap can successfully obtain the required value without locking. The cooperation of these two features and the lock rereading mechanism not only reduces the frequency of requesting the same lock (generally, the read operation can successfully obtain the value without locking), but also reduces the time of holding the same lock (only read)

When the value of the value field is null, the read thread needs to re read after locking).

2.6 cross section operation of concurrenthashmap

In concurrent HashMap, some operations need to involve multiple segments, such as size operation, containsValue operation, etc. Take the size operation as an example. If we want to count the size of elements in the entire ConcurrentHashMap, we must count the size of elements in all segments and sum them. We know that the global variable count in the Segment is a volatile variable. In the multithreading scenario, can we directly add the counts of all segments to get the size of the entire ConcurrentHashMap? Obviously not. Although the latest value of the count of each Segment can be obtained during addition, the count used before accumulation may change after it is obtained, so the statistical results are inaccurate. Therefore, the safest way is to lock all put, remove and clean methods of all segments when counting the size, but this method is obviously very inefficient. Let's take a look at how JDK implements the size() method:

    public int size() {
        final Segment<K,V>[] segments = this.segments;
        long sum = 0;
        long check = 0;
        int[] mc = new int[segments.length];
        // Try a few times to get accurate count. On failure due to
        // continuous async changes in table, resort to locking.
        for (int k = 0; k < RETRIES_BEFORE_LOCK; ++k) {
            check = 0;
            sum = 0;
            int mcsum = 0;
            for (int i = 0; i < segments.length; ++i) {
                sum += segments[i].count;   
                mcsum += mc[i] = segments[i].modCount;  // Record modCount when counting size
            }
            if (mcsum != 0) {
                for (int i = 0; i < segments.length; ++i) {
                    check += segments[i].count;
                    if (mc[i] != segments[i].modCount) {  // After size statistics, compare whether the modCount of each segment has changed
                        check = -1; // force retry
                        break;
                    }
                }
            }
            if (check == sum)// If the modCount of each segment before and after the statistics of size does not change, and the total number obtained twice is the same, it is returned directly
                break;
        }
        if (check != sum) { // Resort to locking all segments / / locking statistics
            sum = 0;
            for (int i = 0; i < segments.length; ++i)
                segments[i].lock();
            for (int i = 0; i < segments.length; ++i)
                sum += segments[i].count;
            for (int i = 0; i < segments.length; ++i)
                segments[i].unlock();
        }
        if (sum > Integer.MAX_VALUE)
            return Integer.MAX_VALUE;
        else
            return (int)sum;
    }