Overview of Java Collections

Posted by Graxeon on Fri, 10 Jan 2020 02:40:18 +0100

An overview of Java collections (above)

Preface

First of all, why do I write such a blog (I always like to write why).Because by the end of the year recently, I was preparing for an interview, so I was making a technical summary of all aspects.Java collections are a very important part of Java, and they have spent a lot of time learning before and after, but they have been scattered.So I plan to take this opportunity to write a summary.

Due to limited capacity, there is not enough accumulation in this area, if there are any problems, please also point out.Thank you.

Collection classification, mainly divided into:

  • Collection (inherits the Iterable interface): a collection stored by a single element
    • List: The main representation of a linear data structure.Ordered, repeatable
    • Set: A collection where duplicate elements are not allowed.Unordered (inconsistent insertion and output order), non-repeatable
    • Queue: A FIFO data structure.Ordered, repeatable, FIFO
  • Map (no inherited interface): Map stored by K-V
    • keySet: You can view all keys.The underlying implementations vary.ConcurrentHashMap is a custom implementation of the KeySetView internal static class that implements the Set interface, while an AbstractMap subclass such as HashMap is the Set interface
    • values: As above, ConcurrentHashMap uses the ValueSetView and HashMap uses the Set interface
    • entrySet: As above, ConcurrentHashMap uses EntrySetView and HashMap uses Set interface

Maps were originally intended to be categorized by AbstractMap; SortedMap; ConcurrentMap; but I found this categorization to be of theoretical value, greater than usage value, or perhaps I am not at a sufficient level right now.Finally, learn to look at the Map from three views, just like the lonely yoke did in Code Efficiency.After that, I will only describe some of the Maps.

In the aspect of discussion, I will mainly discuss from three aspects: data organization (underlying data storage), data processing (such as HashMap put operation), and feature summary.However, due to the amount of content, the code implementation is not very detailed here.

Finally, due to the amount of content, I will divide this part into two parts.This blog focuses on List and Map, while Set and Queue are in another blog.

One, List

ArrayList

Data organization

    transient Object[] elementData; // non-private to simplify nested class access

At the bottom of the ArrayList is an array of Object types.ArrayLists have the same characteristics as arrays: random queries are fast, but data insertions and deletions are slow (because other elements are likely to need to be moved).

Data Processing Method

add
    public void add(int index, E element) {
        // Check if index is in the 0-size range, if not, throw an exception IndexOutOfBoundsException
        rangeCheckForAdd(index);

        // This operation is followed by a number of operations. To sum up, it is a check to determine whether expansion is necessary.
        ensureCapacityInternal(size + 1);  // Increments modCount!!
        // With the System.arraycopy operation, make room for newly added element elements in the corresponding index position of the elementData array
        System.arraycopy(elementData, index, elementData, index + 1,
                         size - index);
        // Following the above operation elementData array index position, assign element
        elementData[index] = element;
        // Number of array elements + 1
        size++;
    }
grow
    // Simply put, it calculates the corresponding capacity (power of 2) based on the given minCapacity, then checks the capacity, and finally expands the capacity
    private void grow(int minCapacity) {
        // overflow-conscious code
        int oldCapacity = elementData.length;
        int newCapacity = oldCapacity + (oldCapacity >> 1);
        if (newCapacity - minCapacity < 0)
            newCapacity = minCapacity;
        if (newCapacity - MAX_ARRAY_SIZE > 0)
            newCapacity = hugeCapacity(minCapacity);
        // minCapacity is usually close to size, so this is a win:
        elementData = Arrays.copyOf(elementData, newCapacity);
    }

Summary

Depending on how the data is organized and processed, it is clear that:

  • ArrayList random queries are fast (locate specific elements in the data directly through index)
  • ArrayList insert and delete operations are slow (involving array element move operation System.arraycopy, and possibly expansion operation)
  • ArrayList is variable in capacity (self-expanding operation, initialization, default DEFAULT_CAPACITY=10)
  • ArrayList is non-thread-safe (no thread-safe)

Supplement:

  • The default capacity of the ArrayList is 10 (that is, when no parameter is constructed)
  • For performance reasons, to avoid multiple scales, it is best to set the corresponding size at initialization (it can automatically scale out even if there is not enough later)

LinkedList

Data organization

    private static class Node<E> {
        E item;
        Node<E> next;
        Node<E> prev;

        Node(Node<E> prev, E element, Node<E> next) {
            this.item = element;
            this.next = next;
            this.prev = prev;
        }
    }

The bottom level of a LinkedList is a custom Node two-way Chain table.LinkedLists have the same characteristics as chained lists: data inserts and deletes quickly, but random access is slow.

Data Processing Method

add
    public void add(int index, E element) {
        // Data check, index is out of 0-size range
        checkPositionIndex(index);

        if (index == size)
            // If the inserted element is placed last, then the tail insert operation is performed (because LinkedList has Node s that hold first and last, so it can operate directly)
            linkLast(element);
        else
            // First, through the node(index) method, get the Node element at the current index location (internal implementation, still traversal).However, depending on the comparison between index and the values in the list, it will be determined whether to traverse from first or last, and then insert using the linkBefore method.
            linkBefore(element, node(index));
    }
peek
    
    // LinkedList implements the Deque interface, so you need to implement the peek method in it.Gets the first element of the current array without deleting it
    public E peek() {
        final Node<E> f = first;
        return (f == null) ? null : f.item;
    }

Summary

Depending on how the data is organized and processed, it is clear that:

  • LinkedList random queries are slow (traversal queries are required, and although the traversal range is halved by the values in the list, how the data is organized determines its speed):

    Tests show that there is a hundreds-fold difference between LinkedList's immediate extraction speed and ArrayList's for 10W data (quoted from Code Out Efficiency)

  • LinkedList inserts and deletes quickly (traversal is still required to locate the target element, but only to modify the front and back node references of the list node)
  • LinkedList is variable in capacity (chained lists can be freely linked)
  • LinkedList is non-thread-safe (no thread-safe)

Supplement:

  • Through the chain table, scattered memory units can be effectively concatenated by reference to form a linear structure to look up in link order with a high memory utilization (from Code Out Efficiency)

Vector

Vector is essentially no different from ArrayList. The bottom level is also an Object array, and the default size is still 10 (although Vector uses a magic number that is not recommended).

The only difference is that Vector adds the Sychronized keyword to the key method to ensure thread safety.

However, due to the rough handling and its characteristics, the performance is poor and has been basically discarded.

That's not going to be repeated here.

CopyOnWriteArrayList

CopyOnWriteArrayList, a member of COW containers, has the idea of changing space for time, mainly for scenes that read more and write less.When an element is written, a new array is created, the elements of the original array are copied over, and a write operation is performed (in which case the array is read or targeted).After the write operation completes, the array for which the read operation is directed is referenced, pointing from the original array to the new one.This allows the write operation to proceed without affecting the read operation.

Data organization

    /** The array, accessed only via getArray/setArray. */
    // Avoid serialization through transient s and ensure visibility through volatile s to ensure thread security for a single property (in this case, a reference variable)
    private transient volatile Object[] array;

Data Processing Method

add
    public void add(int index, E element) {
        final ReentrantLock lock = this.lock;
        // Lock with only one write operation
        // Additionally, lock operations are placed outside the try block, on the one hand, the try specification (lock operations do not cause exceptions and can reduce the size of the try block), and on the other hand, to avoid lock failures, the final release lock has IllegalMonitorStateException exceptions
        lock.lock();
        try {
            // Gets the original array and assigns it to elements (reference variables)
            Object[] elements = getArray();
            int len = elements.length;
            // data verification
            if (index > len || index < 0)
                throw new IndexOutOfBoundsException("Index: "+index+
                                                    ", Size: "+len);
            // The following is to copy the original array and assign it to newElements (and leave the index position)
            Object[] newElements;
            int numMoved = len - index;
            if (numMoved == 0)
                newElements = Arrays.copyOf(elements, len + 1);
            else {
                newElements = new Object[len + 1];
                System.arraycopy(elements, 0, newElements, 0, index);
                System.arraycopy(elements, index, newElements, index + 1,
                                 numMoved);
            }
            // Set the value of the new array index position to element to complete the assignment
            newElements[index] = element;
            // Change array references (which are being read by the read operation) to newElements
            setArray(newElements);
        } finally {
            // The lock needs to be released regardless of the exception.
            lock.unlock();
        }
    }

The biggest feature is this part.The remove operation is similar.So let's not go into any more details.

Summary

Since the data organization of the CopyOnWriteArrayList is the same as that of the ArrayList and is also an array, the following is true:

  • CopyOnWriteArrayList Random Query Fast
  • CopyOnWriteArrayList insertion and slow read-write
  • CopyOnWriteArrayList is variable in capacity (with each addition or deletion, an array is created and replaced)

Supplement:

  • CopyOnWriteArrayList is thread-safe (read-write isolation, write secures threads through ReentrantLock)
  • The write operation of CopyOnWriteArrayList does not directly affect the read operation (they are not for the same array in memory)
  • CopyOnWriteArrayList is only available for read-more-write-less scenarios (after all, writing requires copying arrays)
  • CopyOnWriteArrayList takes up twice as much memory (because the array needs to be copied when writing)
  • CopyOnWriteArrayList performance decreases rapidly as the number of writes and the size of the array increases (m x array size n is written frequently)

Recommendation: With high concurrent requests, you can save up the writes you want to do (add, delete, save separately) and then add All or removeAll.This can effectively reduce resource consumption.But this savings level needs to be well managed, just like the merger request, it needs to be well balanced.

Two, Map

TreeMap

Data organization

Data Processing Method

Summary

HashMap

HashMap is on the one hand a very large collection for work, and on the other hand a high frequency of interviews (I'm almost asked this every interview).

HashMap, like Concurrent HashMap, has differences between before and after Jdk8.However, I should focus on Jdk8 after that, since SpringBoot2.x now requires Jdk8.

Data organization

Before Jdk8
    // Prior to jdk8, the bottom level was array + list
    // The bottom Entry of the list is the internal interface of Map
    transient Entry<K, V>[] table;
After Jdk8
    transient Node<K, V>[] table;


    static class Node<K, V> implements Map.Entry<K, V> {
        final int hash;
        final K key;
        V value;
        Node<K, V> next;
    }

Data Processing Method

put method before Jdk8 (not a lot of comments, because I don't have source code, I follow this section of Note Picture, Hand)
    public V put (K key, V value) {
        // HashMap is created with delay.Determine if the current table is empty.If empty, create an array based on the default value of 15 and assign it to the table
        if (table == EMPTY_TABLE) {
            inflateTable(threshold);
        }
        // data verification
        if ( key == null) 
            return putForNullKey(value);
        // Calculate hash based on key
        int hash = hash(key);
        // By indexFor (seemingly bitwise internally), the K-V key pair's subscript i in the array is calculated based on the hash value of the key and the length of the array
        int i = indexFor(hash, table.length);
        for (Entry<K, V> e = table[i]; e != null; e = e.next) {
            Object k;
            if (e.hash = hash && ((k = e.key) || key.equals(k))) {
                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);
                return oldValue;
            }
        }

            // Record changes + 1, similar to version number
        modCount++;
        addEntry(hash, key, value, i);
        return null;
    }
put method after Jdk8
    public V put(K key, V value) {
        return putVal(hash(key), key, value, false, true);
    }


    // Calculate the hash value of the key (data check, the hash value of the key, its hashCode)
    static final int hash(Object key) {
        int h;
        // It reduces system performance overhead and avoids collisions caused by high bits not participating in subscript operations through its XOR operation of hashCode with low 16 bits
        return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
    }


    // Perform the main put operation
    final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        Node<K,V>[] tab; Node<K,V> p; int n, i;
        // From the code block below, you can see that HashMap and others after Java8 are cryptic
        if ((tab = table) == null || (n = tab.length) == 0)
            // If the table is null, or table.length is 0 (where the assignment statement is mixed), initialize it (through resize(), which is consistent with Spring's refresh() application) and assign its length to n (note here that both are assigned to local variables, not global variables)
            n = (tab = resize()).length;
        // Calculates the key's subscript based on its hash value and determines if the corresponding subscript position in the array is null
        if ((p = tab[i = (n - 1) & hash]) == null)
            // If the corresponding position is null, set the array corresponding i position to the corresponding new Node directly through the newNode method (generating Node)
            tab[i] = newNode(hash, key, value, null);
        else {
            // If the corresponding location is not null, a chain table operation is needed to determine whether the tree is dendrified (red-black tree), whether the capacity is expanded, and so on.
            Node<K,V> e; K k;
            // Using hash, equals, and so on, determine if the newly added key is really equal to the existing key
            // There are two extensions here: First, to determine whether objects are equal, both hashcode and equals must be equal.The former avoids that two objects are just values, but not the same object (two are both p9 gangsters, not two are the same person).The latter avoids hash collisions (even if the memory addresses of two different objects are equal)
            // Second, when I see here, I'm concerned about whether the values will be equal, but hashCode is different, which makes it false here.Then I found that the wrapper type, which had long overridden hashCode methods like Integer's hashCode, returned value directly
            if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
                // If equal, update the corresponding Node directly
                e = p;
            // If the above judgment fails, it determines whether the original array element has been dendrified (TreeNode is no longer a Node type, but TreeNode, of course TreeNode still consists of Node)
            else if (p instanceof TreeNode)
                // If the original array element is already treed, call the putTreeVal method to place the current element in the target red-black tree (which involves operations such as rotation of the red-black tree).
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            // If it's not empty, the same element, or a red-black tree, it means it's already a list (already made up of multiple elements), or it's about to become a list (there's already one element, and a new one is about to be added).
            else {
                // Traverse through the corresponding list elements and record the number of elements that already exist in the list by binCount
                for (int binCount = 0; ; ++binCount) {
                    // If e=p.next() is null, the end of the list is reached (the previous value of E is the last element of the current list).
                    if ((e = p.next) == null) {
                        // Get the Node of the corresponding p through newNode and set it as the last element of the chain table
                        p.next = newNode(hash, key, value, null);
                        // Using binCount to determine if the length of the chain table has reached the tree threshold
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            // At the threshold, the current table array position is dendrized by the current table array and hash value, as well as the treefyBin method
                            treeifyBin(tab, hash);
                        break;
                    }
                    // During traversal, the same element is found, that is, skipped (because the content is the same)
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        break;
                    // The assignment operation, which is part of the chain table operation, continues the chain table traversal
                    p = e;
                }
            }
            // The code below refers to HashMap's putIfAbsent (which also calls putVal, except that the fourth parameter, onlyIfAbsent, is different)
            // Simply put, what happens when you encounter an element with the same key.The put operation is a direct assignment, whereas the putIfAbsent determines whether the value of the corresponding key is null or if it is null.Otherwise it won't change (like Redis)
            // However, this process is controlled by a new fourth parameter to ensure that the same set of code (the putVal method) performs two different functions (put and putIfAbsent)
            if (e != null) { // existing mapping for key
                V oldValue = e.value;
                if (!onlyIfAbsent || oldValue == null)
                    e.value = value;
                afterNodeAccess(e);
                return oldValue;
            }
        }
        // version number
        ++modCount;
        // On the one hand, the size prefix increases spontaneously, on the other hand, it determines whether the size increases beyond the threshold (default 16*0.75=12, array capacity*load factor)
        if (++size > threshold)
            // Volume expansion (rearrangement after double capacity expansion)
            resize();
        // Empty method, reserved for subclasses, such as LinkedHashMap
        afterNodeInsertion(evict);
        return null;
    }

This method can be regarded as the core of HashMap, after all, it can also be regarded as the mechanism of HashMap's operation.

Process description:

  1. If the underlying array of HashMap is not initialized, it is built using the resize() method
  2. Calculate hash value for key, then subscript
  3. If the array's corresponding subscript position is null (I don't think hash collisions should be used here), place it directly in the corresponding position
  4. If the corresponding subscript position of the array is TreeNode (that is, the corresponding position has been treed), the corresponding Node is placed in the tree by the putTreeVal method
  5. Otherwise, traversing the list of arrays corresponding to the subscript position will place the corresponding Node
  6. Tree if the length of the list exceeds the threshold
  7. Replace nodes directly if they have old values
  8. If the number of elements in the array exceeds the threshold (array capacity * load factor), expand (double capacity, rearrange)
get method after Jdk8
    public V get(Object key) {
        Node<K,V> e;
        return (e = getNode(hash(key), key)) == null ? null : e.value;
    }


    // I don't think there's anything to say here.Get target elements from arrays, red-black trees, arrays, depending on the situation
    final Node<K,V> getNode(int hash, Object key) {
        Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
        if ((tab = table) != null && (n = tab.length) > 0 &&
            (first = tab[(n - 1) & hash]) != null) {
            if (first.hash == hash && // always check first node
                ((k = first.key) == key || (key != null && key.equals(k))))
                return first;
            if ((e = first.next) != null) {
                if (first instanceof TreeNode)
                    return ((TreeNode<K,V>)first).getTreeNode(hash, key);
                do {
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        return e;
                } while ((e = e.next) != null);
            }
        }
        return null;
    }

Summary

In terms of usage scenarios, CodeOut Efficiency says:

ConcurrentHashMap is preferred in addition to local methods or absolute thread security scenarios.Although their performance is similar, the latter solves the thread security issue under high concurrency.The dead-chain problem of HashMap and the loss of extended data are two main reasons for cautious use of HashMap.

Here, I cannot help recommending Code Out Efficiency and the accompanying Ali Java Development Manual from the perspective of Java engineers.As a developer who has read a lot of technical books, these two books are also excellent books for me.

However, it is also mentioned that this situation has been repaired and improved after Jdk8.Specifically, you can read books (with a bit more main content).

ConcurrentHashMap

In the ConcurrentHashMap section, I will only describe versions after Jdk8.

Prior to Jdk8, the bottom level was an array of Segments similar to HashTables.Thread security is achieved through segmented locking.It's a compromise between HashTable and HashMap.Complexity is not very high, but it is more complex after Jdk8.First, red and black trees are introduced to optimize the storage structure.Secondly, the original segmented lock design is cancelled, and a more efficient thread security design scheme is adopted (using lock-free operation CAS to synchronize lock with header node, etc.).Finally, a more optimized way to count the number of elements in a collection (quoted from Code Out Efficiency, which I didn't really notice).

Data organization

    transient volatile Node<K,V>[] table;


    static class Node<K,V> implements Map.Entry<K,V> {
        final int hash;
        final K key;
        volatile V val;
        volatile Node<K,V> next;

        Node(int hash, K key, V val, Node<K,V> next) {
            this.hash = hash;
            this.key = key;
            this.val = val;
            this.next = next;
        }

        // Omit its internal method here, and if you are interested, you can view it yourself
    }

From the above point of view, the underlying data organization of ConcurrentHashMap is an array + Chain table.Based on HashMap after Jdk8, it can be inferred that under the corresponding conditions, the chain table will be converted to red-black tree structure.The same is true, see the code.

    static final class TreeNode<K,V> extends Node<K,V> {
        TreeNode<K,V> parent;  // red-black tree links
        TreeNode<K,V> left;
        TreeNode<K,V> right;
        TreeNode<K,V> prev;    // needed to unlink next upon deletion
        boolean red;

        TreeNode(int hash, K key, V val, Node<K,V> next,
                 TreeNode<K,V> parent) {
            super(hash, key, val, next);
            this.parent = parent;
        }

        // Omit its internal method here, and if you are interested, you can view it yourself
    }

ConcurrentHashMap, like HashMap, has TreeNode internally dedicated to red and black trees.

So, from the data organization point of view, ConcurrentHashMap and the same version of HashMap can be said to be carved out of a template (after all, they are all Doug Lea with).

The difference, or the beauty of ConcurrentHashMap, lies in its consideration and handling of multithreading.

There are a lot of details, I only want to explain my understanding of some of the big heads (because there are many details, which I don't know, but also the summary of the big guys).

Data Processing Method

put
    public V put(K key, V value) {
        return putVal(key, value, false);
    }

    /** Implementation for put and putIfAbsent */
    final V putVal(K key, V value, boolean onlyIfAbsent) {
        // Data check, if key or value is Null, direct NPE
        if (key == null || value == null) throw new NullPointerException();
        // hash values are calculated using the spread s method (essentially 16-bit XOR calculations for hashCode, like HashMap, etc.)
        int hash = spread(key.hashCode());
        // Record Link List Length
        int binCount = 0;
        // The loop operation here is for subsequent CAS operations (that is, spin operations of CAS)
        for (Node<K,V>[] tab = table;;) {
            Node<K,V> f; int n, i, fh;
            if (tab == null || (n = tab.length) == 0)
                // Like HashMap, if the array is empty or has a length of 0, an array initialization operation is performed (assignment has been completed in the loop header)
                tab = initTable();
            else if ((f = tabAt(tab, i = (n - 1) & hash)) == null) {
                // If the corresponding position of the array is null, the value is inserted through the CAS operation
                if (casTabAt(tab, i, null,
                             new Node<K,V>(hash, key, value, null)))
                    break;                   // no lock when adding to empty bin
            }
            // If the Node.hash value of the corresponding node is MOVED=-1
            else if ((fh = f.hash) == MOVED)
                // Conduct resize assistance (specific methods of assistance have not been studied)
                tab = helpTransfer(tab, f);
            else {
                V oldVal = null;
                synchronized (f) {
                    if (tabAt(tab, i) == f) {
                        // If the hash value of the corresponding position (that is, the first node) of the array is greater than or equal to zero (in the case of dendrization, etc., the corresponding position hash value is less than zero)
                        //  static final int MOVED     = -1; // hash for forwarding nodes
                        //  static final int TREEBIN   = -2; // hash for roots of trees
                        //  static final int RESERVED  = -3; // hash for transient reservations
                        if (fh >= 0) {
                            // Explain that in this case, the array corresponds to the location and stores a list of chains.Insert a list of chains, traverse (refer specifically to HashMap's put operation)
                            binCount = 1;
                            for (Node<K,V> e = f;; ++binCount) {
                                K ek;
                                if (e.hash == hash &&
                                    ((ek = e.key) == key ||
                                     (ek != null && key.equals(ek)))) {
                                    oldVal = e.val;
                                    if (!onlyIfAbsent)
                                        e.val = value;
                                    break;
                                }
                                Node<K,V> pred = e;
                                if ((e = e.next) == null) {
                                    pred.next = new Node<K,V>(hash, key,
                                                              value, null);
                                    break;
                                }
                            }
                        }
                        // If the element at the corresponding position of the array is a tree node (that is, a TreeBin instance)
                        else if (f instanceof TreeBin) {
                            Node<K,V> p;
                            binCount = 2;
                            // Call the putTreeVal method to insert the values of the red and black trees
                            if ((p = ((TreeBin<K,V>)f).putTreeVal(hash, key,
                                                           value)) != null) {
                                oldVal = p.val;
                                // Judge the onlylfAbsent parameter and set the val ue.Interpretation of the corresponding location with reference to HashMap's put method
                                if (!onlyIfAbsent)
                                    p.val = value;
                            }
                        }
                    }
                }
                // For each of the previous operations, the binCount is calculated (the number of nodes stored in the current location of the array)
                if (binCount != 0) {
                    // If the number of corresponding nodes exceeds the tree threshold TREEIFY_THRESHOLD=8
                    if (binCount >= TREEIFY_THRESHOLD)
                        // Tree the current position of the array
                        treeifyBin(tab, i);
                    if (oldVal != null)
                        return oldVal;
                    break;
                }
            }
        }
            // count
        addCount(1L, binCount);
        return null;
    }

Summary

The beauty of ConcurrentHashMap lies in its thread-safe implementation, which gives you the opportunity to research and write a blog about it.

3. Summary

In fact, Java collections are mainly analyzed in two dimensions.One is how the underlying data is organized, such as chained lists and arrays (basically both, or a combination of both, as in HashMap).The other is thread-safe, which is thread-safe and non-thread-safe.

Finally, due to the adjustment of some underlying data organization, the cycle, orderliness and other characteristics are brought about.

Topics: Java less Spring Redis