JDK source code reading | asynchronous dynamic array ArrayList

Posted by Ruiser on Sun, 26 Dec 2021 04:08:53 +0100

brief introduction

This paper makes some notes according to the source code of ArrayList in jdk and the comments in the source code. Firstly, this paper gives a summary of the official comments given to this class in the ArrayList source code, and then analyzes the inheritance structure, key attributes, construction methods and other core methods of this class. Finally, this paper adds some collected interview questions about ArrayList. The notes mentioned in this article refer to the English Notes in the source code.

Comments for the ArrayList class

  • ArrayList is a resizable array implementation of the list interface. All optional operations of the list are implemented, and all null elements are allowed to be included. In addition to implementing the list interface, this class also provides methods for operating the internal array size used to store the list. (this class is roughly equivalent to Vector, but it is out of sync.)

  • Operations such as size, isEmpty, get, set, iterator and listIterator run within a fixed time. The add operation runs in amortization constant time, that is, adding n elements requires O(N) time. All other operations run in linear time (roughly speaking). Compared with the implementation of LinkedList, the constant coefficient is low.

  • Each ArrayList instance has a capacity. Capacity is the size of the array used to store elements in the list. It is always at least as large as the list size. When an element is added to the ArrayList, its capacity automatically increases. The details of the growth policy are not specified, but adding an element with a fixed amortization time cost is clear.

  • Before adding a large number of elements, you can use the ensureCapacity operation to increase the capacity of ArrayList instances. This may reduce the number of incremental allocations. Note that this implementation is not synchronized. If multiple threads access an ArrayList instance at the same time, and at least one thread has structurally modified the list, it must be synchronized externally.

    (structure modification refers to any operation of adding or deleting one or more elements, which is to readjust the underlying array; setting the value of only one element is not a structural modification) this synchronization is usually achieved by synchronizing some objects that naturally encapsulate the list. If such objects do not exist, the Collections. synchronizedList method should be used Packaging list. This is best done at creation time to prevent accidental asynchronous access to the list. For example:

    List list = Collections.synchronizedList(new ArrayList(... )) ;

  • The iterator of this class and the iterator returned by the listIterator method fail quickly (fast failed) – if the list is structurally modified at any time after the iterator is created, if it is not through the iterator's own Remove or Add methods, the iterator will throw a ConcurrentModificationException. Therefore, in the face of concurrent modifications, the iterator will fail quickly and cleanly.
    Note that the fast failed behavior of the iterator cannot be guaranteed. In general, it is impossible to provide any hard guarantee in the presence of asynchronous concurrent modifications. The iterator throws a ConcurrentModificationException on a best effort basis. Therefore, it is wrong if the correctness of the program depends on this exception: because the fast failure behavior of the iterator should only be used to detect bug s.

This class is a member of the Java Collection framework.

Inheritance system of ArrayList

The diagram of inheritance system is as follows:

It can be seen that it directly implements clonable, Serializable and RandomAccess interfaces and inherits from AbstractList.

The three interfaces here are used as tags and do not contain any implementation. Only through this tag, the class that implements the interface needs to rewrite or contain some methods. For example, object is generally rewritten to implement clonable interface Clone(), and ArrayList implements writeObject (ObjectOutputStream) and readObject (ObjectInputStream) to meet the Serializable requirements, and contains a class member variable serialVersionUID with serialization Id.

Here are some functions of ArrayList after implementing RandomAccess interface and clonable interface.

RandomAccess interface:

The interface acts as a tag. The List that implements the interface supports fast random access. Here, random access refers to List Get (index) this operation. Its primary function is to enable the algorithm to use a better way in random or sequential access. What is the better way? It is mentioned in the comments in the source code:

//for typical instances of the class, thisloop:
for (int i=0, n=list.size(); i < n; i++)
	list.get(i) ;

//runs faster than this loop:
for (Iterator i=list.iterator() ; i.hasNext(): )
	i.next() ;

Therefore, it is faster to use random access when traversing the List that implements this interface.

Clonable interface

It is mentioned in its comments that this interface is used to mark the instance of this class and can be used as object Clone () method implements field to field replication. To implement this interface, you generally need to override object Clone() method.

The rewriting in ArrayList is as follows:

//Returns a shallow copy of the ArrayList instance.
public Object clone() {
    try {
        ArrayList<?> v = (ArrayList<?>) super.clone();
        v.elementData = Arrays.copyOf(elementData, size);
        v.modCount = 0;
        return v;
    } catch (CloneNotSupportedException e) {
        // this shouldn't happen, since we are Cloneable
        throw new InternalError(e);
    }
}

Difference between light copy and deep copy:

If there is a reference type in the element or field of the copied object, the shallow copy directly copies the address of the reference type to the new object, which makes the field or element of the original object and the new object actually shared. Therefore, after the field or element of the meta object is modified, the new object will also change.

In deep copy, if there is a reference type in the element or field of the copied object, a new object corresponding to the reference type will be created, and then a recursive deep copy of the object will be made.

In short: shallow copy means that although there are two accounts, the money in the account is shared and synchronized. Deep copy is isolated without affecting the money of the two accounts.

Key properties of ArrayList

/**
 * Default initial capacity.The default capacity of the collection is 10. When creating an instance of the List collection through new ArrayList(), the default capacity is 10.
 */
private static final int DEFAULT_CAPACITY = 10;

/**
 * Shared empty array instance used for empty instances.
 Empty array, which is used when creating List collection instances through new ArrayList(0).
 */
private static final Object[] EMPTY_ELEMENTDATA = {};

/**
 * Shared empty array instance used for default sized empty instances. We
 * distinguish this from EMPTY_ELEMENTDATA to know how much to inflate when
 * first element is added.
 The default capacity is an empty array. This empty array is used when creating a collection through the new ArrayList() parameterless construction method, which is the same as empty_ The difference of elementdata is that when the first element is added, the empty array will be initialized to default_ Capability (10) elements
 */
private static final Object[] DEFAULTCAPACITY_EMPTY_ELEMENTDATA = {};

/**
 * The array buffer into which the elements of the ArrayList are stored.
 * The capacity of the ArrayList is the length of this array buffer. Any
 * empty ArrayList with elementData == DEFAULTCAPACITY_EMPTY_ELEMENTDATA
 * will be expanded to DEFAULT_CAPACITY when the first element is added.
 The array storing data elements is decorated with transient, and the field is not serialized.
 */
transient Object[] elementData; // non-private to simplify nested class access

//The size of the ArrayList (the number of elements it contains).
//Store the number of data elements. Note that it is the number of elements, not the length of the underlying array elementData.
private int size;

Three construction methods of ArrayList

There are three construction methods for ArrayList: parametric construction with initial capacity, nonparametric construction with default capacity, and parametric construction with set parameters.


    /**
     * Constructs an empty list with the specified initial capacity.
     *
     * @param  initialCapacity  the initial capacity of the list
     * @throws IllegalArgumentException if the specified initial capacity
     *         is negative
     */
   
public ArrayList(int initialCapacity) {
    if (initialCapacity > 0) {
        this.elementData = new Object[initialCapacity];
    } else if (initialCapacity == 0) {
        this.elementData = EMPTY_ELEMENTDATA;
    } else {
        throw new IllegalArgumentException("Illegal Capacity: "+
                                           initialCapacity);
    }
}

/**
 * Constructs an empty list with an initial capacity of ten.
 */
public ArrayList() {
    this.elementData = DEFAULTCAPACITY_EMPTY_ELEMENTDATA;
}

/**
 * Constructs a list containing the elements of the specified
 * collection, in the order they are returned by the collection's
 * iterator.
 *
 * @param c the collection whose elements are to be placed into this list
 * @throws NullPointerException if the specified collection is null
 */
public ArrayList(Collection<? extends E> c) {
    Object[] a = c.toArray();
    if ((size = a.length) != 0) {
        if (c.getClass() == ArrayList.class) {
            elementData = a;
        } else {
            elementData = Arrays.copyOf(a, size, Object[].class);
        }
    } else {
        // replace with empty array.
        elementData = EMPTY_ELEMENTDATA;
    }
}

Other methods of ArrayList

About capacity operation

Reduce the capacity size to the actual array size to minimize storage space.

public void trimToSize() {
    modCount++;//structural modification times + 1
    if (size < elementData.length) {
        elementData = (size == 0)
          ? EMPTY_ELEMENTDATA
          : Arrays.copyOf(elementData, size);
    }
}

Increase capacity size

You can actively call ensuracapacity (int minCapacity) to ensure that there is at least capacity that can accommodate the size specified by minCapacity.

Since the increased capacity belongs to structural modification, modCount + + identifies the number of structural modifications plus one.

Capacity cannot exceed

//This is called actively for users
public void ensureCapacity(int minCapacity) {
    int minExpand = (elementData != DEFAULTCAPACITY_EMPTY_ELEMENTDATA)     
        ? 0
        : DEFAULT_CAPACITY;
    if (minCapacity > minExpand) {
        ensureExplicitCapacity(minCapacity);
    }
}

    private static int calculateCapacity(Object[] elementData, int minCapacity) {
        if (elementData == DEFAULTCAPACITY_EMPTY_ELEMENTDATA) {
            return Math.max(DEFAULT_CAPACITY, minCapacity);
        }
        return minCapacity;
    }
	//The add () method calls this expansion
    private void ensureCapacityInternal(int minCapacity) {
        ensureExplicitCapacity(calculateCapacity(elementData, minCapacity));
    }
	//The capacity will be expanded only when the capacity is insufficient
    private void ensureExplicitCapacity(int minCapacity) {
        modCount++; //structural modification times + 1
        // overflow-conscious code
        if (minCapacity - elementData.length > 0)
            grow(minCapacity);
    }

In fact, the * * grow(int minCapacity) * * method is used to expand the capacity. The way to increase the capacity is to increase half of the current capacity. If the minCapacity is not as large as minCapacity, it will be directly increased to minCapacity.

//Core expansion code
private void grow(int minCapacity) {
    // overflow-conscious code
    int oldCapacity = elementData.length;
    int newCapacity = oldCapacity + (oldCapacity >> 1);
    if (newCapacity - minCapacity < 0)
        newCapacity = minCapacity;
    if (newCapacity - MAX_ARRAY_SIZE > 0)
        newCapacity = hugeCapacity(minCapacity);
    // minCapacity is usually close to size, so this is a win:
    elementData = Arrays.copyOf(elementData, newCapacity);
}
private static int hugeCapacity(int minCapacity) {
    if (minCapacity < 0) // overflow
        throw new OutOfMemoryError();
    return (minCapacity > MAX_ARRAY_SIZE) ?
        Integer.MAX_VALUE :
        MAX_ARRAY_SIZE;
}

Add element:

Add an element

Increase the capacity size and add an element to the end of the list. According to the method it calls, in fact, the capacity will be increased only when the capacity is insufficient.

// Appends the specified element to the end of this list.

public boolean add(E e) {
    ensureCapacityInternal(size + 1);  // Increments modCount!!
    elementData[size++] = e;
    return true;
}

Add an element of a collection

public boolean addAll(Collection<? extends E> c) {
    Object[] a = c.toArray();
    int numNew = a.length;
    ensureCapacityInternal(size + numNew);  // Increments modCount
    System.arraycopy(a, 0, elementData, size, numNew);
    size += numNew;
    return numNew != 0;
}

Delete an element

Delete the element of the specified index as follows, and the method of deleting the specified element remove(Object o) is actually to find the index of the element first, and then use a method similar to this method to delete the element according to the index.

When the element is deleted, the length of the underlying array will be reduced by one, allowing the garbage collector to reclaim heap space.

public E remove(int index) {
    rangeCheck(index);
    modCount++;//structural modification times + 1
    E oldValue = elementData(index);
    int numMoved = size - index - 1;
    if (numMoved > 0)
        System.arraycopy(elementData, index+1, elementData, index,
                         numMoved);
    elementData[--size] = null; // clear to let GC do its work
    return oldValue;
}

Delete all elements:

public void clear() {
    modCount++;//Structural modification times + 1
    // clear to let GC do its work
    for (int i = 0; i < size; i++)
        elementData[i] = null;
    size = 0;
}

Summary:

Other methods are generally in this mode, which is not complex. As long as it involves inserting elements, moving a large number of array elements or adding a large number of elements, it is an underlying function called system Arraycopy (sourceArr,startIndex,length,aimArr,startIndex,length) is completed

The operations that essentially involve structural modification generally include these operations:
Capacity expansion (optional, required if it is an increase), modCount + + (it needs to be executed separately if there is no capacity expansion), find the modified location (index), check Index range, and modify the underlying array elementData.

modCount action

As mentioned in the comments of this class, modCount records the number of structural modifications, For example, the value of modCount will be check ed during the execution of iterator iterator or in the sort () method. If the modcouts are inconsistent due to structural modifications, an error ConcurrentModificationException will be reported (), that is, concurrent modification exception. I regard it as a flag bit of optimistic lock. By default, it can access the list to make some unstructured modifications during iteration or sort operations.

Interview questions

How can ArrayList be expanded?

For the first time, the capacity is expanded by 10. After that, the capacity is expanded by 1.5 times of the original capacity every time. The capacity expansion is moved by 1 bit to the right through bit operation.

The frequent capacity expansion of ArrayList leads to a sharp decline in add performance. How to deal with it?

Define the initial capacity of the ArrayList collection in advance, so that you don't have to spend a lot of time on automatic capacity expansion, that is, specify the capacity in the constructor during initialization.

Does ArrayList have to be slower to insert or delete elements than LinkedList?

In terms of their underlying data structures:

ArrayList is a data structure based on dynamic array
LinkedList is based on the data structure of linked list.
Efficiency comparison:

Header insertion: LinkedList inserts data in the header very quickly, because you only need to modify the prev value and next value of the nodes before and after the element is inserted. ArrayList is slow to insert data in the header because it takes a lot of time to copy the array.
Intermediate insertion: it is slow to insert data in the middle of LinkedList because it takes a lot of time to traverse the linked list pointer (binary search); it is fast to insert data in the middle of ArrayList because it is fast to locate the inserted element position and there are not so many elements for shift operation.
Tail insertion: it is slow to insert data at the tail of LinkedList because it takes a lot of time to traverse the linked list pointer (binary search); it is fast to insert data at the tail of ArrayList, which is fast to locate the position of inserted elements, and the amount of data for shift operation after insertion is small;
Summary:

The comparison results of inserting elements in the set are: the header is inserted faster than the LinkedList; Middle and tail insertion, ArrayList is faster;
Similar to deleting elements in the collection, the header is deleted and the LinkedList is faster; Middle deletion and tail deletion, ArrayList is faster;
Therefore, collections with a small amount of data are mainly inserted and deleted. LinkedList is recommended; For collections with large amount of data, ArrayList can be used. It not only has fast query speed, but also has relatively high insertion and deletion efficiency.

Is ArrayList thread safe?

As mentioned in the comments of the source code, ArrayList is not a thread safe collection! If you need to ensure thread safety, it is recommended to use vector collection, which is thread safe, but it is less efficient than ArrayList. Why is vector thread safe? It adds synchronized locks to all structured operations. Synchronized locks are inefficient. Vector can be regarded as an ArrayList version with synchronized locks.

You can also use list = collections synchronizedList(new ArrayList(…)); To encapsulate the list to realize the synchronization operation of ArrayList.

Under what circumstances do you not need to add a synchronization lock to ArrayList?

First, there is no need to lock in the case of single thread, which is considered for efficiency!
Second, when ArrayList is used as a local variable, it does not need to be locked, because the local variable belongs to a thread. In our above example, ArrayList is used as a member variable. The collection of member variables needs to be shared by all threads, which needs to be locked! (mentioned in the JVM for further understanding.)

How to copy an ArrayList to another ArrayList? How many can you list?

  1. The clone() method is used because ArrayList implements the Cloneable interface and can be cloned
  2. Use the ArrayList construction method, ArrayList (collection <? Extensions E > C)
  3. Use the addall (collection <? Extensions E > C) method
  4. Write your own loop to add()

How can ArrayList be modified concurrently without concurrent modification exceptions?

Question: the member variable set is known to store N multi-user names. In a multi-threaded environment, how to ensure that data can be written to the set normally while using the iterator to read the set data?
Create a new thread task class:

public class CollectionThread implements Runnable{
private static ArrayList<String> list = new ArrayList<>();
static {
    list.add("Jack");
    list.add("Amy");
    list.add("Lucy");
}

@Override
public void run() {
    for (String value : list){
        System.out.println(value);
        // While reading data, it also writes data to the collection
        list.add("Coco");// Concurrent modification exceptions will occur
    }
}
}

The test writes to the shared collection data while reading it under multithreading conditions:

public class Test03 {
    public static void main(String[] args) {
        // Create thread task
        CollectionThread collectionThread = new CollectionThread();

        // Open 10 threads
        for (int i = 0; i < 10; i++) {
            new Thread(collectionThread).start();
        }
    }
}

Now, when traversing the list in this way, the modified modCount will be read, and a concurrent modification error will be reported.

Result error: Java util. ConcurrentModificationException

To solve this problem, java introduces a thread safe collection (read-write separation collection): CopyOnWriteArrayList

So the solution is:

// private static ArrayList<String> list = new ArrayList<>();
    // Replace the original ArrayList with a read-write separated set
    private static CopyOnWriteArrayList<String> list = new CopyOnWriteArrayList<String>();
    static {
        list.add("Jack");
        list.add("Amy");
        list.add("Lucy");
    }
    @Override
    public void run() {
        for (String value : list){
            System.out.println(value);
            // While reading data, it also writes data to the collection
            list.add("Coco");// Concurrent modification exceptions will occur
        }
    }

Successfully solved the concurrent modification exception!

What is the difference between ArrayList and LinkedList?

ArrayList
Data structure based on dynamic array
For random access get and set, its efficiency is better than LinkedList
For random add and remove operations, ArrayList is not necessarily slower than LinkedList (because the bottom layer of ArrayList is a dynamic array, it is not necessary to create a new array every time add and remove)
LinkedList
Data structure based on linked list
LinkedList is not necessarily slower than ArrayList for sequential operations
For random operations, LinkedList is significantly less efficient than LinkedList