[Java]-ArrayList source code analysis

Posted by benyhanna on Fri, 04 Mar 2022 06:49:31 +0100

ArrayList source code analysis

The source code viewed in this article is based on jdk1 eight

RandomAccess

ArrayList implements the RandomAccess interface, while the LinkedList class does not. This is a flag interface. As long as List implements this interface, it can support fast random access. For example, the binarySearch method in the Collections class

public static <T>
int binarySearch(List<? extends Comparable<? super T>> list, T key) {
    if (list instanceof RandomAccess || list.size()<BINARYSEARCH_THRESHOLD)
        return Collections.indexedBinarySearch(list, key);
    else
        return Collections.iteratorBinarySearch(list, key);
}

It can be seen that if the List implements the RandomAccess interface, the traditional for loop based on index will be used during traversal, otherwise the iterator will be used for traversal
In other words, in the design of JDK, the for loop is used when traversing ArrayList, and the iterator iterator is used when traversing LinkedList. Because traversing the ArrayList using the for loop is faster than using the iterator, and traversing the LinkedList using the iterator is faster than using the for loop
Reason: ArrayList is a storage structure based on array (index), so the complexity of using index to obtain an element is O(1), so using for for traversal is fast enough, and there is no need to spend extra time with the help of iterator; The underlying LinkedList is implemented based on a two-way linked list. The complexity of using the index to obtain elements is O(n). If you use the iterator to traverse the LinkedList, it will move directly along the successor nodes of the linked list nodes. Therefore, it takes less time to traverse with the iterator than with the for loop

Primary member variable

private static final int DEFAULT_CAPACITY = 10; //Default capacity, which will be used when adding the first element
private static final Object[] EMPTY_ELEMENTDATA = {}; //An empty array
//It is also an empty array. The difference from the above empty array is analyzed in the constructor of ArrayList and when storing the first element
private static final Object[] DEFAULTCAPACITY_EMPTY_ELEMENTDATA = {}; 
transient Object[] elementData; //An array of data elements
private int size; //Number of elements stored in the current structure
private static final int MAX_ARRAY_SIZE = Integer.MAX_VALUE - 8;//Maximum size of elementData array set by JDK

Constructor

public ArrayList(int initialCapacity) {
        if (initialCapacity > 0) {
            this.elementData = new Object[initialCapacity];
        } else if (initialCapacity == 0) {
            this.elementData = EMPTY_ELEMENTDATA;
        } else {
            throw new IllegalArgumentException("Illegal Capacity: "+
                                               initialCapacity);
        }
    }
public ArrayList() {
    this.elementData = DEFAULTCAPACITY_EMPTY_ELEMENTDATA;
}

When constructing an ArrayList with a specified size, if the specified size is greater than 0, the value specified by the user will be the size of the ArrayList. If the specified value is less than 0, an exception will be thrown. If it is equal to 0, the elementData storing the data will be directed to EMPTY_ELEMENTDATA
If a parameterless constructor is used, make elementData point to defaultcapability_ EMPTY_ elementData

public ArrayList(Collection<? extends E> c) {
    Object[] a = c.toArray();
    if ((size = a.length) != 0) {
        if (c.getClass() == ArrayList.class) {
            elementData = a;
        } else {
            elementData = Arrays.copyOf(a, size, Object[].class);
        }
    } else {
        // replace with empty array.
        elementData = EMPTY_ELEMENTDATA;
    }
}

This constructor is used to pass in a Collection and copy the contents of the Collection as the contents of ArrayList

Capacity expansion mechanism

When it comes to ArrayList, we usually talk about its underlying capacity expansion mechanism. The relevant methods include add, grow and so on. Next, start with the add method to simulate the capacity expansion process when adding elements to the ArrayList
First, use the parameterless constructor to construct an ArrayList, and its elementData will point to defaultcapability_ EMPTY_ elementData, then call the following add method to add the first element.

add

public boolean add(E e) {
    ensureCapacityInternal(size + 1);  // Increments modCount!!
    elementData[size++] = e;
    return true;
}

To add an element, first judge whether the size of the data field elementData in the current ArrayList is enough to store one more element, that is, before formally adding this element, ensure that the size of elementData is at least the number of elements stored in the current elementData size + 1. Therefore, when adding the first element, the size minCapacity is required to be at least 1. This ensures that the ensureCapacityInternal method is used

ensureCapacityInternal

private void ensureCapacityInternal(int minCapacity) {
    ensureExplicitCapacity(calculateCapacity(elementData, minCapacity));
}

calculateCapacity

Let's first look at the calculateCapacity method

private static int calculateCapacity(Object[] elementData, int minCapacity) {
    if (elementData== DEFAULTCAPACITY_EMPTY_ELEMENTDATA) {
        return Math.max(DEFAULT_CAPACITY, minCapacity);
    }
    return minCapacity;
}

It can be seen that if the element is added for the first time and the parameterless constructor is used to create the ArrayList object, the default size of the first secondary expansion is default_ Capability, i.e. 10 (the maximum value of default_capability and minCapacity is selected in the code, but in fact, minCapacity is 1 when the element is added for the first time). This is defaultcapability_ EMPTY_ELEMENTDATA and empty_ The difference of elementdata is that if the user uses a parametric constructor to specify its initial capacity when creating an ArrayList object, the first capacity expansion will not be attempted to 10, but will be based on the capacity specified by the user. My understanding is that if the user uses a parameterless constructor, it can be considered that he has no requirements for the initial capacity of ArrayList. Then the JDK will simply expand the capacity to 10 for the first expansion, so there is no need to expand the capacity when adding the first to tenth elements. If the user specifies the initial capacity, the one specified by the user shall prevail, Don't expand the capacity by default. Even if the initial capacity set by the user is 0, there may be multiple expansion operations when adding the first few elements. Don't worry about it. Fully respect the user

ensureExplicitCapacity

The next step is the ensureExplicitCapacity method. Explicit means clear and definite, that is, the minimum capacity determined by the calculateCapacity method is the exact minimum capacity:

private void ensureExplicitCapacity(int minCapacity) {
    modCount++;
    // overflow-conscious code
    if (minCapacity - elementData.length > 0)
        grow(minCapacity);
}

Overflow conscious is the judgment code set by the JDK designer considering overflow
Judge the relationship between the required minimum capacity minCapacity and the capacity of the current elementData. The minimum capacity is 1, while the capacity of the current elementData is 0. Therefore, to expand the capacity, call the grow method. At this time, the minCapacity value is 10

grow

private void grow(int minCapacity) {
    // overflow-consciouscode
    int oldCapacity = elementData.length;
    int newCapacity = oldCapacity + (oldCapacity >> 1);
    if (newCapacity - minCapacity < 0)
        newCapacity = minCapacity;
    if (newCapacity - MAX_ARRAY_SIZE > 0)
        newCapacity = hugeCapacity(minCapacity);
    // minCapacity is usually close to size, so this is a win:
    elementData = Arrays.copyOf(elementData, newCapacity);
}

First set the new capacity to 1.5 times of the old capacity, but if the new capacity is smaller than the required minimum capacity minCapacity, set the new capacity to minCapacity; If the newCapacity is larger than the designed maximum size of elementData, max_ ARRAY_ If the size is still large, call the hugeCapacity method to calculate the new capacity to be set finally. At this time, newCapacity is 0, minCapacity is 10, and 10 is less than MAX_ARRAY_SIZE, so the final newCapacity is 10, then expand the elementData to 10, and finally return to the add method to execute the elementData[size++] = e statement, and formally add the first element to elementData. At the same time, the size is set to 1, indicating that the number of elements stored in the whole ArrayList is 1, and the first element is added successfully
Next, add the second element. The minCapacity parameter passed into the ensureCapacityInternal method is 2. After the calculateCapacity method makes a decision, it will return 2. Then execute the ensureExplicitCapacity method. Since the length of the current elementData is 10, it will not be expanded. Directly return to the add method to put the element into the elementData array. The process of adding the third, fourth, Fifth... Ten elements is the same
When the eleventh element is added, the minCapacity parameter passed into the ensureCapacityInternal method is 11. After the calculateCapacity method returns 11, and then the ensureExplicitCapacity method is executed. Since the length of the current elementData is 10 less than 11, growth (11) will be executed. The current capacity is 10, so the calculated newCapacity is 15, and 15 is greater than 11, Therefore, the final newCapacity value is 15, which means that we often say that the default capacity expansion is 1.5 times

hugeCapacity

In the grow th method, you can see that when the result calculated by newCapacity is greater than MAX_ARRAY_SIZE, you need to execute the hugeCapacity method to determine the final result of newCapacity

private static int hugeCapacity(int minCapacity) {
    if (minCapacity < 0) // overflow
        throw new OutOfMemoryError();
    return (minCapacity > MAX_ARRAY_SIZE) ?
        Integer.MAX_VALUE :
        MAX_ARRAY_SIZE;
}

If the minimum capacity required, minCapacity is also better than max_ ARRAY_ If the size is still large, directly take the maximum value of the integer variable as the size of the new capacity, otherwise take MAX_ARRAY_SIZE, as the size of the new capacity, determines the value of the final new capacity size newCapacity

Delete element

The main methods involved in deleting elements are the remove(int index) method:

public E remove(int index) {
    rangeCheck(index);

    modCount++;
    E oldValue = elementData(index);

    int numMoved = size - index - 1;
    if (numMoved > 0)
        System.arraycopy(elementData, index+1, elementData, index,
                         numMoved);
    elementData[--size] = null; // clear to let GC do its work

    return oldValue;
}

This method first checks the validity of the index, and then obtains the deleted element as the final return value of the method. Use numMoved = size - index - 1 to calculate the length of space to be copied after deleting an element, because after deleting an element (except the last element), all elements from the last element of the deleted element to the last element should be moved forward by one bit. When the index is size - 1, the naturally calculated value is 0, That is, move elements forward without copying. When numMoved is greater than 0, move the last numMoved element in elementData forward by one bit through copying, and finally return the deleted element. The method is completed

About modCount

There are detailed comments on the field modCount in the source code of the parent class AbstractList of ArrayList, which roughly means that this field records the number of times the structure of the List is changed (similar to the implementation method of optimistic lock, there is a version number version method, and the version is added by one every time the data is modified), and the structure is changed. For example, the length and size of the List are changed, Or other operations that may affect the process of iterative traversal. This field is used by the implementation classes of the iterator and List iterator. If its value is accidentally modified, the iterator will throw a ConcurrentModificationException and modify the exception concurrently when calling the next, remove, previous and other methods. This is actually a fast fail mechanism to prevent uncertain behavior caused by concurrent modification during iteration. Whether the subclass uses this field or not is optional. If the subclass wants the iterator iterator List iterator to have a fail fast mechanism, the subclass only needs to add incremental operations to this field in the add method, remove method, and other methods that will change the List structure
ArrayList uses this field. Therefore, when we use iterators to traverse (or enhance the for loop), if other threads modify the List concurrently, we will throw ConcurrentModificationException

Topics: Java data structure