[interview classic] detailed explanation of ArrayList

Posted by grimmier on Wed, 05 Jan 2022 21:54:45 +0100

Due to the long length of this article, in order to facilitate everyone's reading and collection, this article has been specially transformed into a PDF document.

click Download Java learning manual, pdf tutorial.

1. Collection overview

Collections in Java are mainly divided into three categories:

List: sequential and repeatable.
Set: no sequence and cannot be repeated.
Map: no sequence and cannot be repeated.

In the List class collection, the most commonly used is ArrayList.

2. ArrayList overview

ArrayList is divided into two words: Array+List. Array represents array and List represents List. Therefore, it also indicates that the underlying of ArrayList is implemented using arrays.

The length of a traditional array must be defined during initialization, and the length cannot be changed.

int[] arr1 = new int[10];
// Syntax definition method. The length is determined according to the initial member during initialization
int[] arr2 = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};

ArrayList is a dynamic array. The length can not be defined during initialization. When the JVM judges that the capacity of ArrayList is insufficient, the array will be expanded automatically.

ArrayList<Integer> arrayList = new ArrayList<>();

3. Array properties

The bottom layer of ArrayList is implemented by array, so ArrayList has all the characteristics of array. Before studying ArrayList, you must understand the underlying implementation principle of arrays in the JVM.

Array properties:

Array elements must be of the same data type.
Array elements are stored continuously in memory.
The random access efficiency of array elements is particularly high. Constant level random access can be realized, and the time complexity is O(1).

According to the first feature, the space occupied by each element in the array is the same. The third characteristic can be obtained by combining the first characteristic with the second characteristic.

Question: why is array query more efficient than linked list?

When an array object is created, the JVM assigns it a base address. When querying the K + 1st element in the array, you only need [base address + k * element size] to get the address of the K + 1st element directly, so that you can access the data in the element. This process only performs one addressing operation.

When querying the K + 1st element in the linked list, generally, the K + 1st element will be found from the head node of the linked list through the next pointer. This operation requires K addressing operations.

4. ArrayList source code analysis

4.1 inheritance structure

Type 4.2 structure

public class ArrayList<E> extends AbstractList<E>
        implements List<E>, RandomAccess, Cloneable, java.io.Serializable {
    
    // During serialization, verify whether the versions of the transport class and the local class are consistent
    private static final long serialVersionUID = 8683452581122892189L;

    // Array default initialization capacity
    private static final int DEFAULT_CAPACITY = 10;
    
    // The number of elements currently contained in the array
    private int size;

    // Array of data stores
    transient Object[] elementData; 
    
    // Shared empty array instance for empty instance
    private static final Object[] EMPTY_ELEMENTDATA = {};
   
    // Shared empty array instance for empty instances of default size
    private static final Object[] DEFAULTCAPACITY_EMPTY_ELEMENTDATA = {}; 	
    
    // Collection version number. Every time you add or delete elements in the collection, it will be + 1
    protected transient int modCount = 0;
    
    ...
}

The ArrayList class inherits the AbstractList abstract class and implements the List interface, indicating that the ArrayList instance has the most basic location operations such as add, remove, set and get.
The ArrayList class implements the RandomAccess tag interface, and the tagged ArrayList instance has the ability of fast random access.
The ArrayList class implements the Cloneable tag interface, and the marked ArrayList instance can be cloned.
The ArrayList class implements the Serializable tag interface. The marked ArrayList instance supports serialization and can be transmitted in the network.
The elementData member variable is modified by the transient keyword, indicating that elementData will be ignored during the serialization of ArrayList instances. Because in the actual use scenario, elementData may not be full, and only the data part needs to be serialized. Therefore, ArrayList uses rewriting writeObject method and readObject method to define the serialization process of elementData.
The modCount member variable is used to trigger the fail fast mechanism, which will be described in detail below.

4.3 initialization

4.3.1 nonparametric constructor

public ArrayList() {
    // Assign the value defaultcapability_ EMPTY_ Elementdata empty array
    this.elementData = DEFAULTCAPACITY_EMPTY_ELEMENTDATA;
}

Use defaultcapability_ EMPTY_ An empty array of elementData is assigned to elementData.

4.3.2 parametric int constructor

public ArrayList(int initialCapacity) {
    // Judge whether the initial capacity is 0
    if (initialCapacity > 0) {
        // Creates an array of the specified capacity
        this.elementData = new Object[initialCapacity];
    } else if (initialCapacity == 0) {
        // Assign EMPTY_ELEMENTDATA empty array
        this.elementData = EMPTY_ELEMENTDATA;
    } else {
        throw new IllegalArgumentException("Illegal Capacity: "+ initialCapacity);
    }
}

When the initial capacity is not 0, directly create an array with the specified capacity and assign it to elementData.

When the initial capacity is 0, empty is used_ An empty array of elementData is assigned to elementData.

4.3.3 parameterized Collection constructor

// Other collections can be converted to ArrayList
public ArrayList(Collection<? extends E> c) {
    // The collection is converted into an array through the toArray method and assigned to elementData
    elementData = c.toArray();
    if ((size = elementData.length) != 0) {
        // c. The array type converted by toArray method may not be Object [] (this is a bug, which has been fixed by jdk9)
        if (elementData.getClass() != Object[].class)
            // Create an Object [], and copy the contents of the original elementData
            elementData = Arrays.copyOf(elementData, size, Object[].class);
    } else {
        // Assign EMPTY_ELEMENTDATA empty array
        this.elementData = EMPTY_ELEMENTDATA;
    }
}

When the capacity of the Collection is not 0, it is directly converted into an array and assigned to elementData.

Empty is used when the capacity of the Collection is 0_ An empty array of elementData is assigned to elementData.

4.4 adding elements

public boolean add(E e) {
    // Verify whether elementData needs to be expanded. After adding elements, the minimum capacity is size+1
    ensureCapacityInternal(size + 1); 
    // Add element action
    elementData[size++] = e;
    return true;
}

private void ensureCapacityInternal(int minCapacity) {
    // Calculate the minimum capacity first, and then confirm whether the calculated minimum capacity is greater than the original capacity. If it is greater than the original capacity, expand the capacity
    ensureExplicitCapacity(calculateCapacity(elementData, minCapacity));
}

// Calculate the minimum capacity of elementData
private static int calculateCapacity(Object[] elementData, int minCapacity) {
    // To create a parameterless constructor, you need to initialize the default capacity and return the larger of the minimum capacity and the initial default capacity
    if (elementData == DEFAULTCAPACITY_EMPTY_ELEMENTDATA) {
        // DEFAULT_CAPACITY = 10
        return Math.max(DEFAULT_CAPACITY, minCapacity);
    }
    // The parameter constructor does not need to initialize the default capacity, but directly returns the minimum capacity
    return minCapacity;
}

// Confirm the capacity of elementData
private void ensureExplicitCapacity(int minCapacity) {
    // Set version number + 1
    modCount++;
    if (minCapacity - elementData.length > 0)
        // Capacity expansion
        grow(minCapacity);
}

private void grow(int minCapacity) {
    int oldCapacity = elementData.length;
    // General capacity expansion rule newCapacity = 1.5 * oldCapacity
    int newCapacity = oldCapacity + (oldCapacity >> 1);
    // If the capacity after normal capacity expansion is still smaller than the minimum capacity, the minimum capacity is directly used as the capacity after capacity expansion
    if (newCapacity - minCapacity < 0)
        newCapacity = minCapacity;
    // Judge whether the capacity after expansion exceeds the maximum capacity
    // MAX_ARRAY_SIZE = Integer.MAX_VALUE - 8;
    if (newCapacity - MAX_ARRAY_SIZE > 0)
        // Set integer MAX_ Value is assigned to newCapacity
        newCapacity = hugeCapacity(minCapacity);
    // For the actual operation of capacity expansion, create an array with capacity equal to newCapacity and copy all the elements of elementData
    elementData = Arrays.copyOf(elementData, newCapacity);
}

Main process:

Before adding elements each time, you need to calculate the minimum capacity by case.
- If you create using a parameterless constructor, select the larger from 10 and size + 1 as the minimum capacity.
- If you create with a parametric constructor, select size + 1 as the minimum capacity.
Then confirm whether to expand the capacity according to the calculated minimum capacity.
- If the minimum capacity is greater than the original capacity, elementData expands the capacity, and then adds elements.
- If the minimum capacity is less than or equal to the original capacity, you can add it directly.

be careful:

capacity is elementdata length.
The elementData in the ArrayList instance created by the parameterless constructor is an empty array with capacity 0 before adding any elements. When an element is added for the first time, it will be expanded to an array with capacity of 10.

4.5 removing elements

The ArrayList class provides three ways to remove elements: remove according to subscripts, remove according to elements, and remove by iterators.

Removing according to subscript is essentially the same as removing according to element. All elements after the target element is deleted are moved forward by one bit to cover the target element and achieve the purpose of deletion.

How to remove iterators is explained separately below.

4.5.1 remove according to subscript

// Remove element by subscript
public E remove(int index) {
    // Judge whether the index is greater than or equal to size. If it is greater than, an exception will be thrown
    rangeCheck(index);
    // Set version number + 1
    modCount++;
    // Find elementData[index] according to index
    E oldValue = elementData(index);
    // Number of elements to be moved
    int numMoved = size - index - 1;
    if (numMoved > 0)
        // Move all elements after the target node is deleted forward by one bit
        System.arraycopy(elementData, index+1, elementData, index, numMoved);
    // Empty the last position of the elementData data part, size - 1
    elementData[--size] = null; 

    return oldValue;
}

// @SuppressWarnings("unchecked") means to have the compiler ignore unchecked warnings
@SuppressWarnings("unchecked")
E elementData(int index) {
    return (E) elementData[index];
}

Main process:

Judge whether the subscript is out of bounds. If it is out of bounds, it will end directly.
Determine the number of elements to move.
Moves all elements after the target element is deleted one bit forward.

4.5.2 remove according to element

public boolean remove(Object o) {
    // Traverse the array, match elements one by one, find the corresponding elements and delete them
    if (o == null) {
        for (int index = 0; index < size; index++)
            if (elementData[index] == null) {
                fastRemove(index);
                return true;
            }
    } else {
        for (int index = 0; index < size; index++)
            if (o.equals(elementData[index])) {
                fastRemove(index);
                return true;
            }
    }
    return false;
}

private void fastRemove(int index) {
    // Set version number + 1
    modCount++;
    // Calculate the number of elements that need to be moved
    int numMoved = size - index - 1;
    if (numMoved > 0)
        // Move all elements after the target node is deleted forward by one bit
        System.arraycopy(elementData, index+1, elementData, index, numMoved);
    // Empty the last position of the elementData data part, size - 1
    elementData[--size] = null; 
}

Main process:

The subscript traverses the array, matching element by element.
After matching to the target deletion element, record the subscript of the element.
Determine the number of elements to move.
Moves all elements after the target element is deleted one bit forward.

4.6 traversal elements

ArrayList provides three traversal methods, including iterator traversal, for loop traversal, and enhanced for loop traversal.

But there are only two real traversal schemes, one is iterator traversal, and the other is array traversal.

Iterator traversal: iterator traversal and enhanced for loop traversal.
Array traversal: for loop traversal.

Iterator traversal has the specific implementation scheme of ArrayList iterator, which is given below.

Array traversal uses the underlying implementation of ArrayList, which is array. Through the base address and spatial continuity of the array, you can access the array circularly through subscripts.

// Iterator traversal
Iterator<Integer> iterator = arr.iterator();
while (iterator.hasNext()) {
    System.out.println(iterator.next());
}

// for loop traversal (not recommended)
for (int i = 0; i < arr.size(); i ++) {
    System.out.println(arr.get(i));
}

// Enhanced for loop traversal
for (Integer cur : arr) {
    System.out.println(cur);
}

// Decompile code for enhanced for loop traversal
Iterator var = arr.iterator();
while(var.hasNext()) {
    Integer cur = (Integer)var.next();
    System.out.println(cur);
}

It is not recommended to use array traversal to traverse ArrayList. The reason why we know that we can use the for loop to traverse ArrayList is because we know that the underlying maintenance of ArrayList is an array. Such code is tightly coupled with the collection itself, and the access logic cannot be separated from the collection class and the client code. Different collections correspond to different traversal methods, and the client code cannot be reused. In practical application, how to integrate the above two sets is quite troublesome, so there is the emergence of iterators.

4.7 iterators

4.7.1 iterator mode

Java provides many kinds of collections, and the internal structure of each collection is different. For example, the bottom layer of ArrayList maintains an array, the bottom layer of LinkedList maintains a linked list, and the bottom layer of HashSet maintains a hash table. Because the internal structure of the container is different, you often don't know how to traverse a collection, so Java extracts the access logic from different types of collections and abstracts it into the iterator pattern.

Iterator pattern: provides a way to access individual elements in a container object without exposing the internal details of the object container.

Type 4.7.2 structure

The implementation of iterators in ArrayList is to define a private internal class Itr in the ArrayList class, and then expose a member method for creating iterators.

public Iterator<E> iterator() {
    return new Itr();
}

private class Itr implements Iterator<E> {
    // Index of the next element
    int cursor;  
    // The index of the previous element, if not, is - 1
    int lastRet = -1; 
    // Iterator version number, which is initialized to the collection version number when the iterator is instantiated
    int expectedModCount = modCount;
    
    ...
}

The Itr class implements the Iterator interface, which means that the Iterator has the basic rules for iterating over the Collection.
The difference between the Iterator interface and the iteratable interface is
- The Iterator interface is the Iterator that can actually traverse the Collection. If a Collection only needs to design an Iterator, it can directly implement the Iterator interface.
- Iterator interface is aggregated in iteratable interface. In this way, if a collection needs to design many different iterators, the Iterable interface can be implemented. For example, listIterator and descending iterator are designed in LinkedList.
Only a collection of internal Iterator classes that implement the Iterator interface can be used as an object to enhance for loop traversal.

Source code of two interfaces:

public interface Iterable<T> {
    Iterator<T> iterator();
}

public interface Iterator<E> {
    boolean hasNext();
    E next();
    void remove();
}

4.7.3 iterator traversal

The ArrayList iterator uses the hasNext method and the next method together with the while loop to traverse the elements.

// Determines whether the next object element exists
public boolean hasNext() {
    return cursor != size;
}

//Get next element
@SuppressWarnings("unchecked")
public E next() {
    // Check whether the iterator version number is equal to the collection version number. If not, throw an exception
    checkForComodification();
    // The subscript of the next element is assigned to i
    int i = cursor;
    // Judge whether i exceeds the data area. If i exceeds the data area, throw an exception
    if (i >= size)
        throw new NoSuchElementException();
    Object[] elementData = ArrayList.this.elementData;
    // Judge whether i is out of bounds. If it is out of bounds, throw an exception
    if (i >= elementData.length)
        throw new ConcurrentModificationException();
    // The cursor points to the next element
    cursor = i + 1;
    // i is assigned to lastRet to return the previous element pointed to by the cursor
    return (E) elementData[lastRet = i];
}

// Check that the iterator version number and the collection version number are equal
final void checkForComodification() {
    if (modCount != expectedModCount)
        throw new ConcurrentModificationException();
}

Main process:

Check whether the version number matches. If not, the traversal ends.
Judge whether the current subscript pointed by cursor exceeds the data area. If so, the traversal ends.
Judge whether the current subscript pointed by cursor exceeds the array boundary. If so, the traversal ends.
cursor points to the next element.
lastRet points to the previous element that cursor points to.
Returns the element pointed to by lastRet.
When cursor points to size, the traversal ends.

4.7.4 iterator removal

It is recommended to use the remove method provided by the iterator to remove elements while traversing the ArrayList with the iterator.

public void remove() {
    // Judge whether the cursor being traversed is in the data part. If not, throw an exception
    if (lastRet < 0)
        throw new IllegalStateException();
    // Check whether the iterator version number is equal to the collection version number. If not, throw an exception
    checkForComodification();
	
    try {
        // Use the remove method in the ArrayList class to remove elements
        ArrayList.this.remove(lastRet);
        // cursor points to the previous element
        cursor = lastRet;
        // Reset lastRet
        lastRet = -1;
        // Sync iterator version number
        expectedModCount = modCount;
    } catch (IndexOutOfBoundsException ex) {
        throw new ConcurrentModificationException();
    }
}

// Check that the iterator version number and the collection version number are equal
final void checkForComodification() {
    if (modCount != expectedModCount)
        throw new ConcurrentModificationException();
}

Main process:

Judge whether the target element to be removed is in the valid data area. If not, discard the removal.
Check whether the version numbers match. If not, discard the removal.
Use the remove method defined by the ArrayList class to remove the element.
Synchronously update iterator version number.

4.7.5 fail fast mechanism

The fail fast mechanism, that is, the fast failure mechanism, is an error detection mechanism in the Java collection. In the process of traversing a collection with iterators, when the structure of the collection changes, it is possible to trigger fail fast, that is, throw a ConcurrentModificationException. The fail fast mechanism does not guarantee that exceptions will be thrown under unsynchronized modifications. It just tries its best to throw exceptions, so this mechanism is generally only used to detect bugs.

Note that it is only possible to trigger the fail fast mechanism by traversing the collection with an iterator.

In ArrayList, the fail fast mechanism is also implemented. The ArrayList class sets modCount as the collection version number, and modCount will be increased by 1 every time elementData is modified. The iterator class in the ArrayList class also sets expectedModCount as the iterator version number. When the iterator is created, the current modCount is assigned to expectedModCount. When the iterator traverses each element, it will match the modCount and expectedModCount. If the matching is inconsistent, it will immediately throw a ConcurrentModificationException to stop traversal.

For example, execute the following code to trigger the fail fast mechanism:

public static void main(String[] args) {
    ArrayList<String> arr = new ArrayList<>();
    arr.add("1");
    arr.add("2");
    arr.add("2");
    arr.add("3");

    Iterator<String> iterator = arr.iterator();
    while (iterator.hasNext()) {
        String cur = iterator.next();
        if ("2".equals(cur)) {
            // It should be changed to iterator remove();
            arr.remove(cur);
        }
    }
}

Therefore, when traversing the ArrayList with the iterator, if you want to delete elements, you must use the remove method provided by the iterator instead of the remove method provided by the ArrayList class.

If the remove method defined by the ArrayList class is used alone, the collection version number modCount will be increased by 1, and the iterator version number will not change. In this way, the checkForComodification method called in the next method checks that two version numbers do not match and thus throws an exception.

Although the remove method provided by the iterator is also the remove method defined by the ArrayList class at the bottom to remove elements, the remove method provided by the iterator adds the operation of synchronously updating the iterator version number expectedModCount after removing elements. In this way, the modCount will always be consistent with the expectedModCount to ensure the normal traversal.

5. Interview questions

5.1 topic 1

Title: defaultprotocol_ EMPTY_ELEMENTDATA and EMPTY_ELEMENTDATA is an empty array. What's the difference between the two?

A: the two are used to share empty arrays. They are mainly used to distinguish.

An empty array constructed by a parameterless constructor will use DefaultAttribute_ EMPTY_ELEMENTDATA assigns a value to elementData, and the empty array constructed by the parametric constructor will be EMPTY_ELEMENTDATA assigns a value to elementData.

For ArrayList s created by different constructors, the capacity expansion strategy is slightly different. During capacity expansion, it will judge whether elementData is created by a parameterless constructor or a parameterless constructor, so as to select the corresponding strategy for capacity expansion.

5.2 topic 2

Title: how is ArrayList expanded?

A: the capacity expansion strategy of ArrayList is:

It is created with a parameterless constructor. The initial capacity is 10 and each expansion is 1.5 times the original capacity. (general capacity expansion strategy)
It is created with a parametric constructor, and each expansion is 1.5 times of the original capacity.

For more information, please go to Complete collection of java learning materials Receive view

Topics: Java Algorithm Interview

Programmer Think

[interview classic] detailed explanation of ArrayList

1. Collection overview

2. ArrayList overview

3. Array properties

4. ArrayList source code analysis

4.1 inheritance structure

4.3 initialization

4.3.2 parametric int constructor

4.3.3 parameterized Collection constructor

4.4 adding elements

4.5 removing elements

4.5.1 remove according to subscript

4.6 traversal elements

4.7 iterators

5. Interview questions

5.1 topic 1

5.2 topic 2

Hot Topics