Java collection source code analysis: ArrayList

Posted by magie on Thu, 10 Feb 2022 17:56:56 +0100

After so much preparation, it finally started. Ha, ArrayList Kaigan! ArrayList should be the most frequently used collection class. Let's take a look at how the document introduces it.

We can know that ArrayList is actually a replica of Vector, but it only removes thread safety. ArrayList is a List implementation that can be dynamically resized. Its data order is always consistent with the insertion order, and other features are consistent with List.

1, ArrayList inheritance structure

As can be seen from the structure diagram, ArrayList is a subclass of AbstractList and implements the List interface. In addition, it also implements three identification interfaces, none of which has any method. They are only used as identification to indicate that the implementation class has a certain function. RandomAccess means that the implementation class supports fast random access, clonable means that the implementation class supports cloning, which is embodied in rewriting the clone method, Java io. Serializable means serialization is supported. If you need to customize this process, you can override the writeObject and readObject methods.

However, the most frequently asked questions in the ArrayList interview are: what is the initial size of the ArrayList? When initializing ArrayList, you may call the parameterless constructor directly. You have never known or paid attention to such problems, as follows:

ArrayList<String> strings = new ArrayList<>();

We have also mentioned in the previous contents that ArrayList is based on array, and the length of array is also immutable. There is such a problem. Since the array is of fixed length, why can ArrayList insert one piece of data, thousands or even tens of thousands of pieces of data without specifying the length?

ArrayList can dynamically adjust its size, so we can insert multiple pieces of data without perception, which means that ArrayList must have a default size. If you want to expand its size, you can only copy it. In this way, the default size and how to dynamically adjust its size will have a great impact on its performance. Next, let's give an example to illustrate this situation:

For example, if the default size is 10, we insert 10 pieces of data into it. At this time, it has no impact. If we want to insert 20 pieces of data, we need to adjust the size of ArrayList to 30. At this time, it involves an array copy. If we want to continue to insert 50 pieces of data, we need to copy the array and adjust the size to 80 In other words, when the capacity is exhausted or insufficient, every time we insert a piece of data into it, it will involve a copy of the data. Moreover, the larger the data is, the more data needs to be copied, and its performance will decline rapidly.

ArrayList is just an encapsulation of an array. It must have taken some measures to solve the problems mentioned above. If we don't take these measures to improve performance, what's the difference between using an array and using an array. Let's take a look at what ArrayList has done and how to use them?

Let's start with initialization.

2, ArrayList construction method and initialization

ArrayList has three construction methods, using the following two member variables:

//This is an array used to mark the storage capacity, and it is also an array for storing actual data.
//When the ArrayList is expanded, its capacity is the length of the array.
//It is empty by default. After adding the first element, it will be directly extended to default_ Capability, i.e. 10
//The difference between this and size is that ArrayList does not expand as much as it needs
transient Object[] elementData;

//Here is the number of data actually stored
private int size;

Here's an explanation: fields modified by the transient keyword cannot be serialized.

In addition to the above two variables, you also need to master one variable, which is:

protected transient int modCount = 0;

The main function of this variable
This is to prevent changing the size of the ArrayList during some operations, which will make the results unpredictable.

In addition to these, there are some:

 private static final int DEFAULT_CAPACITY = 10;
  private static final Object[] EMPTY_ELEMENTDATA = {};
  private static final Object[] DEFAULTCAPACITY_EMPTY_ELEMENTDATA = {};

Let's take a look at the constructor:

1. Constructor

//Default construction method. The document states that its default size is 10, but as the elementData definition says,
//Only after inserting a piece of data can it be expanded to 10, but in fact, it is empty by default
 public ArrayList() {
    this.elementData = DEFAULTCAPACITY_EMPTY_ELEMENTDATA;
}

//For the construction method with initial size, once the size is specified, elementData is no longer the original mechanism.
public ArrayList(int initialCapacity) {
    if (initialCapacity > 0) {
        this.elementData = new Object[initialCapacity];
    } else if (initialCapacity == 0) {
        this.elementData = EMPTY_ELEMENTDATA;
    } else {
        throw new IllegalArgumentException("Illegal Capacity: "+
                                               initialCapacity);
    }
}

//Construct an ArrayList with initialization data from another Collection.
//Here you can see that size represents the amount of data stored
//This also shows the abstract charm of Collection, which can be transformed between different structures
public ArrayList(Collection<? extends E> c) {
    //The most important conversion is toArray(), which is defined in the Collection
    elementData = c.toArray();
    if ((size = elementData.length) != 0) {
        if (elementData.getClass() != Object[].class)
            elementData = Arrays.copyOf(elementData, size, Object[].class);
    } else {
        // replace with empty array.
        this.elementData = EMPTY_ELEMENTDATA;
    }
}

You can see that in the default parameterless constructor, the ArrayList created is actually
Empty. Only when a piece of data is inserted will the capacity be expanded to the default 10 It should be noted here.

2, Overridden methods in ArrayList

As we all know, ArrayList is already a concrete implementation class, so all the methods defined in the List interface are implemented in it. There are some methods in ArrayList that have been implemented in AbstractList, but things are rewritten here again. Let's see the difference.

Let's first look at some simpler methods:

//Remember the implementation in AbstractList? That is based on Iterator.
//There is no need to convert to Iterator before operation
public int indexOf(Object o) {
    if (o == null) {
        for (int i = 0; i < size; i++)
            if (elementData[i]==null)
                return i;
    } else {
        for (int i = 0; i < size; i++)
            if (o.equals(elementData[i]))
                return i;
    }
    return -1;
}

//And indexOf are the same
 public int lastIndexOf(Object o) {
    //...
}

//For the same reason, there are already all elements, and there is no need to use Iterator to obtain elements
//Note that when returning, the elementData is truncated to size
public Object[] toArray() {
    return Arrays.copyOf(elementData, size);
}

//For conversion with type, see here a[size] = null; This is not very useful unless you are sure that all elements are not empty,
//Can judge how much useful data has been obtained through null.
public <T> T[] toArray(T[] a) {
    if (a.length < size)
        // The given data length is not enough. Copy a new one and return it
        return (T[]) Arrays.copyOf(elementData, size, a.getClass());
    System.arraycopy(elementData, 0, a, 0, size);
    if (a.length > size)
        a[size] = null;
    return a;
}

After reading these simple, let's take a look at the addition, deletion and query. In addition, deletion and query, neither modification nor query involves the change of array length, while addition and deletion involves the problem of dynamic resizing, which is linked to performance. Let's first see how the modification and query are realized:

private void rangeCheck(int index) {
    if (index >= size)
        throw new IndexOutOfBoundsException(outOfBoundsMsg(index));
}

//Gets the element at the corresponding subscript
    E elementData(int index) {
        return (E) elementData[index];
    }

//As long as the obtained data location is between 0-size
public E get(int index) {
    rangeCheck(index);

    return elementData(index);
}

//Change the value of the corresponding position
public E set(int index, E element) {
    rangeCheck(index);

    E oldValue = elementData(index);
    elementData[index] = element;
    return oldValue;
}

Adding and deleting is the most important part of ArrayList. This part of the code needs to be carefully considered and understood. Let's see how the source code is implemented

//Add an element at the end
public boolean add(E e) {
    //Make sure the elementData array is long enough
    ensureCapacityInternal(size + 1);  // Increments modCount!!
    elementData[size++] = e;
    return true;
}

public void add(int index, E element) {
    rangeCheckForAdd(index);

    //Make sure the elementData array is long enough
    ensureCapacityInternal(size + 1);  // Increments modCount!!
    //Move the data back one bit and insert it after leaving the position
    System.arraycopy(elementData, index, elementData, index + 1,
                         size - index);
    elementData[index] = element;
    size++;
}

Guys, it should be easy to find that the above two methods use the ensureExplicitCapacity method. Let's see how this method is implemented:

//It was mentioned when defining elementData. Insert the first data and expand it directly to 10
private void ensureCapacityInternal(int minCapacity) {
    if (elementData == DEFAULTCAPACITY_EMPTY_ELEMENTDATA) {
        minCapacity = Math.max(DEFAULT_CAPACITY, minCapacity);
    }
    
    //Here I handed over the work again
    ensureExplicitCapacity(minCapacity);
}

//If the length of elementData cannot meet the requirements, it needs to be expanded
private void ensureExplicitCapacity(int minCapacity) {
    modCount++;

    // overflow-conscious code
    if (minCapacity - elementData.length > 0)
        grow(minCapacity);
}

//expansion
private void grow(int minCapacity) {
    // overflow-conscious code
    int oldCapacity = elementData.length;
    //You can see that this is 1.5x expanded
    int newCapacity = oldCapacity + (oldCapacity >> 1);
    
    //After the expansion, it is still not satisfied. At this time, it is directly expanded to minCapacity
    if (newCapacity - minCapacity < 0)
        newCapacity = minCapacity;
    //Prevent overflow
    if (newCapacity - MAX_ARRAY_SIZE > 0)
        newCapacity = hugeCapacity(minCapacity);
    // minCapacity is usually close to size, so this is a win:
    elementData = Arrays.copyOf(elementData, newCapacity);
}

Let's analyze the whole process of add method:

Add (E), add an element at the end of the list:
Here, we assume that there is a stuList with a size of 7,.

Step 1:
Because it adds elements at the end of the list, the minimum capacity required for capacity expansion is the current size+1; That's 8

Step 2:

In the body of the ensureCapacityInternal method, the ensureExplicitCapacity method is called, and the parameter passed in is the return value minCapacity of the calculateCapacity method. Let's take a look at the parameter part first

Step 3:

You can see that one of the parameters passed in from this method is the array of the current list, that is, the array with the capacity of 8 in the current stuList, and the other is minCapacity (8) in the first step. If the current array is the default empty array when calling the empty parameter constructor to create ArrayList, the larger of the minimum capacity and the default capacity (10) will be returned. If not, the minimum capacity of size+1 will be returned. This is not the default empty array, so it returns 8

Step 4:
modCount is the number of times the current collection has been modified. Here, if the minimum capacity is greater than the length of the current array, 8 > 7, so go to the next step of grow th to expand the capacity of the expander.

Step 5:

Assign the unchanged length of the current array to oldCapacity (7). The value of newCapacity is old capacity + old capacity ÷ 2 (at this time, the new capacity is 10 and the minimum capacity is 8). Then, in comparison, if the new capacity is less than the minimum capacity, the new capacity is assigned to the minimum capacity (the new capacity is still 10). In the next step, prevent overflow, If the new capacity is greater than the maximum array length (Integer.MAX_VALUE - 8), execute the hugeCapacity method:

The return value is: if the minimum capacity > max_ ARRAY_ Size, it returns the maximum upper limit of int, otherwise it returns the maximum length of the array. Continue with the above, then copy the array, copy the previous elements, and change the size to newCapacity.

Last step:

Then add elements at the end of the array.

Several other add methods are not explained here, and their implementation principles are similar to those above.

We see the code here, and I guess you almost understand it. In fact, the capacity expansion mechanism of ArrayList is to create an empty array elementData first. When inserting data for the first time, it will be directly expanded to 10. If the length of elementData is not enough, it will be expanded by 1.5 times. If it is not enough, it will use the required length as the length of elementData.

This method is obviously better than our example, but it will copy data frequently when encountering a large amount of data. So how to alleviate this problem? ArrayList provides us with two feasible solutions:

1. Using the parameter structure of ArrayList(int initialCapacity), a large size is declared at the time of creation, which solves the problem of frequent copying. However, we need to predict the order of data in advance and occupy a large memory all the time.

2. In addition to automatically expanding the capacity when adding data, we can also expand the capacity once before inserting. As long as the order of magnitude of the data is predicted in advance, it can be directly expanded in place at one time when needed. Compared with ArrayList(int initialCapacity), it does not have to occupy large memory all the time, and the number of data copies is greatly reduced. This method is ensureCapacity(int minCapacity), which calls ensureCapacityInternal(int minCapacity).

There are also some methods similar to the add method, such as addAll, which have similar implementation principles. We won't analyze them one by one here. Here I list them. If you want to have an in-depth understanding, you can look at the source code yourself

//Set the size of elementData as large as size to free up all useless memory
public void trimToSize() {
    //...
}

//Deletes the element at the specified location
public E remove(int index) {
    //...
}

//Delete according to the element itself
public boolean remove(Object o) {
    //...
}

//Add some elements at the end
public boolean addAll(Collection<? extends E> c) {
    //...
}

//Add some elements from the specified position
public boolean addAll(int index, Collection<? extends E> c){
    //...
}

Next, let's look at the delete remove method


First check whether the subscript is out of bounds, and then count the number of times it has been modified. First find the element to be deleted according to the subscript, and then use the arrayCopy method to process it. Here, let's take an example to see the process of array copying:
For an array nums{a,b,c,d,e,f,g}, we need to delete the elements with subscript 3, that is, d, the source array elementData, copy from the subscript index+1, and the target array elementData, copy numMoved elements from the subscript index of the target array, system The result of arraycopy (Num, 4, num, 3, 3) is {a,b,c,e,f,g}
After copying, set the last place after moving to null.

The deletion principle of several other methods related to remove is similar to that above.

ArrayList also optimizes the ListIterator and SubList implemented by its parent, mainly using location to access elements. I won't explain them one by one here. Interested partners can see the source code themselves.

3, Some other implementation methods of ArrayList

ArrayList not only implements all the functions defined in the List, but also implements methods such as equals, hashCode, clone, writeObject and readObject. These methods need to cooperate with the stored data, otherwise the result will be wrong, or the cloned data is only a shallow copy, or the data itself does not support serialization, which we can pay attention to when defining data. Let's mainly look at what is customized during serialization.

//Here we can solve our confusion. elementData is modified by transient, that is, it will not participate in serialization
//Here we see that the data is written one by one, and the size is also written in
private void writeObject(java.io.ObjectOutputStream s)
    throws java.io.IOException{
    // Write out element count, and any hidden stuff
    int expectedModCount = modCount;
    s.defaultWriteObject();

    // Write out size as capacity for behavioural compatibility with clone()
    s.writeInt(size);

        // Write out all elements in the proper order.
    for (int i=0; i<size; i++) {
        s.writeObject(elementData[i]);
    }

    //The function of modCount is reflected here. If the serialization is modified, an exception will be thrown
    if (modCount != expectedModCount) {
        throw new ConcurrentModificationException();
    }
}

readObject is the opposite process, which is to restore the data correctly and set elementData. If you are interested, you can read the source code by yourself.

4, ArrayList thread is unsafe

import java.util.ArrayList;
import java.util.List;
import java.util.UUID;

public class NoSafeArrayList {
    public static void main(String[] args) {

        List<String> list=new ArrayList();
        for (int i=0;i<30;i++) {
            new Thread(()->{
                list.add(UUID.randomUUID().toString().substring(8));  //UUID tool class, take an eight bit random string, and there is a common method to take non repeated strings: system Currenttime() current timestamp
                System.out.println(list);
            }).start();
        }
    }
}


ArrayList class is thread unsafe in multi-threaded environment. In the case of multi-threaded reading and writing, it will throw concurrent modification exception

ArrayList thread insecurity is mainly reflected in two aspects:

1, Not atomic operation

elementData[size++] = e;

Assign value first, and the size is + 1

But there is nothing wrong with the thread executing this code, but in a multithreaded environment, the problem is big. Maybe one thread will overwrite the value of the other thread.

for instance:

1. The list is empty. size = 0.
2. Thread a has finished executing elementData[size] = e; Then hang. A puts "a" in the position with subscript 0. At this time, size = 0.
3. Thread b executes elementData[size] = e; Because size = 0 at this time, b puts "b" in the position with subscript 0, so it just overwrites the data of A.
4. Thread B increases the value of size to 1.
5. Thread A increases the value of size to 2.

In this way, when both thread a and thread b are executed, ideally "a" should be in the position marked with 0 and "b" should be in the position marked with 1. The actual situation is that the position with subscript 0 is "b", and the position with subscript 1 has nothing.

2, Non atomic operation during capacity expansion

The default array size of ArrayList is 10. Suppose that 9 elements have been added, and size = 9.

1. Thread A has finished executing the ensureCapacityInternal(size + 1) in the add function and is suspended.
2. Thread b starts to execute. It checks the capacity of the array and finds that there is no need to expand the capacity. So put "b" in the position with subscript 9, and the size increases by 1. size = 10.
3. Thread a then executes, trying to put "a" at the subscript 10 because size = 10. However, because the array has not been expanded and the maximum subscript is 9, an array out of bounds exception ArrayIndexOutOfBoundsException will be thrown.

5, Summary

1. The bottom layer of ArrayList is actually implemented with an elementData array.

2. The difference between ArrayList and array lies in the grow th method of ArrayList, which can realize automatic capacity expansion.

3. ArrayList can store null values

4. ArrayList is the same as array, which is more suitable for random data access (query and modification), rather than a large number of insertion and deletion. LinkedList is better for insertion and deletion.

If the supplement is not in place, welcome to leave a message!

Topics: Java data structure Interview set arraylist