Application of Write-time Replication in CopyOnWriteArrayList

Posted by exhaler on Wed, 24 Nov 2021 18:24:12 +0100

Foreword: The code in this article is based on JDK1.8

Thread Insecure List

  • java.util.LinkedList
  • java.util.ArrayList

Thread-safe List

  • java.util.Vector
  • Java.util.Stack (a subclass of Vector that encapsulates Vectors and can only do a FIFO stack operation)
  • Java.util.Collections.SynchronizedList (a static internal class in Collections)
  • java.util.concurrent.CopyOnWriteArrayList

How does Vector keep threads safe?

Let's first look at Vector's key code

public class Vector<E>
    extends AbstractList<E>
    implements List<E>, RandomAccess, Cloneable, java.io.Serializable
{
    protected Object[] elementData;

    public synchronized E get(int index) {
        if (index >= elementCount)
            throw new ArrayIndexOutOfBoundsException(index);
        return elementData(index);
    }

    public synchronized E set(int index, E element) {
        if (index >= elementCount)
            throw new ArrayIndexOutOfBoundsException(index);

        E oldValue = elementData(index);
        elementData[index] = element;
        return oldValue;
    }

    public synchronized boolean add(E e) {
        modCount++;
        ensureCapacityHelper(elementCount + 1);
        elementData[elementCount++] = e;
        return true;
    }

    public synchronized E remove(int index) {
        modCount++;
        if (index >= elementCount)
            throw new ArrayIndexOutOfBoundsException(index);
        E oldValue = elementData(index);

        int numMoved = elementCount - index - 1;
        if (numMoved > 0)
            System.arraycopy(elementData, index+1, elementData, index,
                             numMoved);
        elementData[--elementCount] = null; // Let gc do its work

        return oldValue;
    }

}


You can see that the implementation of Vector is simple and rough, using synchronized locks directly to ensure thread security, all method operations are serial.

How does Collections.SynchronizedList keep threads safe?

static class SynchronizedList<E>
    extends SynchronizedCollection<E>
    implements List<E> {
    final List<E> list;

    SynchronizedList(List<E> list) {
        super(list);
        this.list = list;
    }

    public E get(int index) {
        synchronized (mutex) {return list.get(index);}
    }
    public E set(int index, E element) {
        synchronized (mutex) {return list.set(index, element);}
    }
    public void add(int index, E element) {
        synchronized (mutex) {list.add(index, element);}
    }
    public E remove(int index) {
        synchronized (mutex) {return list.remove(index);}
    }

}

You can see that passing in the List we need to lock through the SynchronizedList construction method is equivalent to wrapping the List's methods and manipulating the List using the methods provided by SynchronizedList.

Why are Vector and Collections.SynchronizedList get methods locked?

I personally understand that Vector and Collections.SynchronizedList get methods can be synchronized to ensure sequential and real-time consistency. When a thread reads data, it is certain that it can see all the data written by other threads before unlocking, and the arrays of Vector and Collections.SynchronizedList are not decorated with volatile, unlocked, Visibility is also not guaranteed.

CopyOnWriteArrayList

brief introduction

CopyOnWriteArrayList is used instead of a synchronous List, which in some cases provides better concurrency performance and does not require locking or replication of containers during iteration.
The thread security of the Copy-On-Write container is that as long as a fact-immutable object is correctly published, no further synchronization is required to access it. With each modification, a new copy of the container is created and republished to achieve variability.
Obviously, however, every time a container is modified, the underlying array is copied, which requires some overhead, especially when the container is large in size.
The Copy As Write container should only be used when iteration is much more than modification.

We can see that the CopyOnWriteArrayList source maintains an array of objects to store each element of the collection and is accessed through getArray and setArray with a default initialization length of 0.

Main methods

New Elements

public boolean add(E e) {
    final ReentrantLock lock = this.lock;
    lock.lock(); // Locking
    try {
        Object[] elements = getArray(); // Get old array
        int len = elements.length;
        Object[] newElements = Arrays.copyOf(elements, len + 1); // Expanded Array
        newElements[len] = e;
        setArray(newElements); // Replace the old array with the new one
        return true;
    } finally {
        lock.unlock();
    }
}

We can see that elements are locked when they are added, and we need to copy the old array, add elements to the new array, and then write the array back to the global variable array, replacing the old array.

Delete element

public E remove(int index) {
    final ReentrantLock lock = this.lock;
    lock.lock(); // Locking
    try {
        Object[] elements = getArray();
        int len = elements.length;
        E oldValue = get(elements, index);
        int numMoved = len - index - 1;
        // If the element is the last, subtract it directly, otherwise copy it back and forth separately
        if (numMoved == 0)
            setArray(Arrays.copyOf(elements, len - 1));
        else {
            Object[] newElements = new Object[len - 1];
            System.arraycopy(elements, 0, newElements, 0, index);
            System.arraycopy(elements, index + 1, newElements, index,
                             numMoved);
            setArray(newElements);
        }
        return oldValue; // Return deleted elements
    } finally {
        lock.unlock();
    }
}

Modify Element

public E set(int index, E element) {
    final ReentrantLock lock = this.lock;
    lock.lock();
    try {
        Object[] elements = getArray();
        E oldValue = get(elements, index);

        if (oldValue != element) {
            int len = elements.length;
            Object[] newElements = Arrays.copyOf(elements, len);
            newElements[index] = element;
            setArray(newElements);
        } else {
            // Not quite a no-op; ensures volatile write semantics
            setArray(elements);
        }
        return oldValue;
    } finally {
        lock.unlock();
    }
}

Get Elements

public E get(int index) {
    return get(getArray(), index);
}

You can see that the read operation is unlocked

Iteration Set

public Iterator<E> iterator() {
    return new COWIterator<E>(getArray(), 0);
}

static final class COWIterator<E> implements ListIterator<E> {
    private final Object[] snapshot;
    private int cursor;

    private COWIterator(Object[] elements, int initialCursor) {
        cursor = initialCursor;
        snapshot = elements;
    }

    public boolean hasNext() {
        return cursor < snapshot.length;
    }

    public boolean hasPrevious() {
        return cursor > 0;
    }

    @SuppressWarnings("unchecked")
    public E next() {
        if (! hasNext())
            throw new NoSuchElementException();
        return (E) snapshot[cursor++];
    }

    @SuppressWarnings("unchecked")
    public E previous() {
        if (! hasPrevious())
            throw new NoSuchElementException();
        return (E) snapshot[--cursor];
    }

    public int nextIndex() {
        return cursor;
    }

    public int previousIndex() {
        return cursor-1;
    }

    public void remove() {
        throw new UnsupportedOperationException();
    }

    public void set(E e) {
        throw new UnsupportedOperationException();
    }

    public void add(E e) {
        throw new UnsupportedOperationException();
    }

    @Override
    public void forEachRemaining(Consumer<? super E> action) {
        Objects.requireNonNull(action);
        Object[] elements = snapshot;
        final int size = elements.length;
        for (int i = cursor; i < size; i++) {
            @SuppressWarnings("unchecked") E e = (E) elements[i];
            action.accept(e);
        }
        cursor = size;
    }
}

What you can see is that when you iterate over the CopyOnWriteArrayList, you make a snapshot reference to the array, you are not allowed to modify the elements during the iteration, and if other threads make changes to the set, it will not affect the iteration. This iteration is invisible, so there is no fail-fast problem.

performance testing

Let's benchmark Vector, Collections.SynchronizedList, CopyOnWriteArrayList with JMH

Write operation

@Fork(1)
@Threads(1)
@State(value = Scope.Benchmark)
@BenchmarkMode(Mode.AverageTime)
@Warmup(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Measurement(iterations = 5, time = 1)
public class ListWriteTest {

    private static final int SIZE = 100_000;
    private static final int THREAD_SIZE = 4;

    private void testListWrite(List<Integer> list) {
        Runnable runnable = () -> {
            for (int i = 0; i < SIZE; ++i) {
                list.add(i);
            }
        };

        List<Thread> threadList = IntStream.of(THREAD_SIZE).mapToObj(num -> new Thread(runnable)).collect(Collectors.toList());

        threadList.forEach(Thread::start);
        for (Thread thread : threadList) {
            try {
                thread.join();
            } catch (InterruptedException e) {
                e.printStackTrace();
            }
        }
    }

    @Benchmark
    public void testVectorWrite() {
        testListWrite(new Vector<>());
    }

    @Benchmark
    public void testSynchronizedWrite() {
        testListWrite(Collections.synchronizedList(new ArrayList<>()));
    }

    @Benchmark
    public void testCopyOnWriteArrayListWrite() {
        testListWrite(new CopyOnWriteArrayList<>());
    }

    public static void main(String[] args) throws RunnerException {
        Options opt = new OptionsBuilder()
                .include(ListWriteTest.class.getSimpleName())
                .result("result.json")
                .resultFormat(ResultFormatType.JSON).build();
        new Runner(opt).run();
    }
}

The results are as follows, and you can see that CopyOnWriteArrayList is more than 2000 times slower than Vector and more than 1600 times slower than Collections.SynchronizedList

Vector > Collections.SynchronizedList > CopyOnWriteArrayList

Benchmark                                    Mode  Cnt          Score          Error  Units
ListWriteTest.testCopyOnWriteArrayListWrite  avgt    5  717519212.500 ± 41981491.296  ns/op
ListWriteTest.testSynchronizedWrite          avgt    5     437740.739 ±    29130.674  ns/op
ListWriteTest.testVectorWrite                avgt    5     353874.172 ±     6534.240  ns/op

Read operation

@Fork(1)
@Threads(1)
@State(value = Scope.Benchmark)
@BenchmarkMode(Mode.AverageTime)
@Warmup(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Measurement(iterations = 5, time = 1)
public class ListReadTest {

    private static final int SIZE = 100_000;
    private static final int THREAD_SIZE = 4;

    private static final List<Integer> vector = new Vector<>();
    private static final List<Integer> synchronizedList = Collections.synchronizedList(new ArrayList<>());
    private static final List<Integer> cowList = new CopyOnWriteArrayList<>();

    public ListReadTest() {
        for (int i = 0; i < 100; ++i) {
            vector.add(i);
            synchronizedList.add(i);
            cowList.add(i);
        }
    }

    private void testListRead(List<Integer> list) {
        Runnable runnable = () -> {
            ThreadLocalRandom current = ThreadLocalRandom.current();
            for (int i = 0; i < SIZE; ++i) {
                list.get(current.nextInt(0, 100));
            }
        };

        List<Thread> threadList = IntStream.of(THREAD_SIZE).mapToObj(num -> new Thread(runnable)).collect(Collectors.toList());

        threadList.forEach(Thread::start);
        for (Thread thread : threadList) {
            try {
                thread.join();
            } catch (InterruptedException e) {
                e.printStackTrace();
            }
        }
    }


    @Benchmark
    public void testVectorRead() {
        testListRead(vector);
    }

    @Benchmark
    public void testSynchronizedRead() {
        testListRead(synchronizedList);
    }

    @Benchmark
    public void testCopyOnWriteArrayListRead() {
        testListRead(cowList);
    }

    public static void main(String[] args) throws RunnerException {
        Options opt = new OptionsBuilder()
                .include(ListReadTest.class.getSimpleName())
                .result("result.json")
                .resultFormat(ResultFormatType.JSON).build();
        new Runner(opt).run();
    }
}

You can see that CopyOnWriteArrayList is 3.8 times faster than Collections.SynchronizedList and 3.3 times faster than Vector.

CopyOnWriteArrayList > Vector > Collections.SynchronizedList

Benchmark                                  Mode  Cnt        Score       Error  Units
ListReadTest.testCopyOnWriteArrayListRead  avgt    5   285867.076 ± 21779.424  ns/op
ListReadTest.testSynchronizedRead          avgt    5  1086997.915 ± 19161.630  ns/op
ListReadTest.testVectorRead                avgt    5   967604.177 ± 11735.727  ns/op

Summary:

CopyOnWriteArryList is based on write-time replication technology. Read operation is unlocked and write operation is locked, which reflects the idea of read-write separation, but cannot provide real-time consistency. When read more and write less, consider using CopyOnWriteArrayList instead of synchronizing List.

Advantage

For some data that read more and write less, copy while writing is a good practice, such as configuration information, black and white list and so on, with very little change. This is a lock-free implementation, which can help our programs achieve higher concurrency.

shortcoming

Data consistency issues: CopyOnWriteArrayList only helps us achieve final consistency, not real-time consistency
Write performance issue: CopyOnWriteArray List write operations are several thousand times slower than other locked lists because each write requires copying of data and frequently writes to memory causing Java GC to occur frequently.

Topics: Java Multithreading IDEA