LinkedList is 1000 times slower than ArrayList? (dynamic graph + performance evaluation)

Posted by nomis on Mon, 15 Jun 2020 07:07:42 +0200

Array and linked list are two kinds of data structures commonly used in the program, and they are also one of the interview questions often tested in the interview. However, for many people, they just vaguely remember the difference between them. Maybe they still don't remember it right. And every time they come for an interview, they have to recite these concepts. It's a bit troublesome. This article will start with the execution process chart and performance evaluation, so that you can understand and remember the difference between them more deeply. After this in-depth study, I believe it will make you remember deeply.

array

Before we start (performance evaluation), let's review, what is array?

Array is defined as follows:

Array is a data structure composed of a collection of elements of the same type, which allocates a continuous block of memory for storage. The storage address of the element can be calculated by its index.

The simplest type of data structure is a one-dimensional array. For example, an array of 32-bit integers with an index of 0 to 9 can be used to store 10 variables in memory address 200020042008,... 2036, so the element with an index of i is the 2000 + 4 × i address in memory. The memory address of the first element of an array is called the first address or the base address.

In short, an array is a data structure composed of a block of continuous memory. There is a key word "continuous" in this concept, which reflects one of the major characteristics of an array, that is, it must be composed of a continuous memory.

The data structure of the array is shown in the following figure:

The process of adding an array is shown in the following figure:

Advantages of arrays

The "continuous" feature of an array determines its fast access speed. Because it is continuous storage, its storage location is fixed, so its access speed is very fast. For example, now there are 10 rooms in chronological order. When we know that the first house is occupied by 20-year-old people, then we know that the second house is 21-year-old people, the fifth house is 24-year-old people... And so on.

Disadvantages of arrays

When misfortune comes, misfortune comes. Array continuity has both advantages and disadvantages. The advantages have been mentioned above, but the disadvantages have higher requirements on memory. It is necessary to find a continuous memory.

Another disadvantage of arrays is that the efficiency of insertion and deletion is relatively slow. If we insert or delete a data in the non tail of the array, then we need to move all the data after it, which will bring some performance overhead. The deletion process is shown in the following figure:

There is another disadvantage of arrays, which are fixed in size and cannot be expanded dynamically.

Linked list

Linked list is a data structure complementary to array. Its definition is as follows:

Linked list is a common basic data structure, which is a kind of linear table. However, it does not store data in linear order, but stores the Pointer of the next node in each node. Because it does not have to be stored in order, the chain table can achieve O(1) complexity when inserting, which is much faster than the other linear table sequence table, but it takes O(n) time to find a node or access a specific number of nodes, and the corresponding time complexity of the sequence table is O(logn) and O(1), respectively.

In other words, the linked list is a data structure without continuous memory storage. The elements of the linked list have two attributes, one is the value of the element, the other is a pointer, which marks the address of the next element.

The data structure of the linked list is shown in the following figure:

The process of adding a linked list is shown in the following figure:

The process of deleting a linked list is shown in the following figure:

List classification

The list is mainly divided into the following categories:

One way list
Double linked list
Circular list

One way list

A one-way linked list contains two fields, an information field and a pointer field. This link points to the next node in the list, and the last node points to a null value. The linked list shown above is a one-way linked list.

Double linked list

Double linked list is also called double linked list. In the double linked list, there is not only a pointer to the next node, but also a pointer to the previous node. In this way, you can access the previous node from any node, of course, you can also access the later node, and even the entire linked list.

The structure of the two-way linked list is shown in the following figure:

Circular list

The first node in the circular list is the last node before it, and vice versa. The unbounded circular list makes it easier to design the algorithm on such a list.

The structure of the circular list is shown in the following figure:

Why are there single and double linked lists?

Some people may ask, since there is already a one-way list, why do you want a two-way list? What are the advantages of double linked list?

This starts with the deletion of the linked list. If the element of the one-way linked list is to be deleted, not only the deleted node, but also the previous node of the deleted node (usually referred to as the precursor) should be found, because the next in the previous node needs to be changed But because it is a one-way linked list, there is no information about the previous node in the deleted node. Then we need to query the linked list again to find the previous node, which brings some performance problems, so there is a two-way linked list.

Advantages of linked list

The advantages of linked list can be roughly divided into the following three:

The utilization of linked list to memory is relatively high, no need for continuous memory space, even if there are memory fragments, it does not affect the creation of linked list;
The insertion and deletion of linked list are very fast, so there is no need to move a large number of elements like array;
The size of the linked list is not fixed, so it can be easily expanded dynamically.

Disadvantages of linked list

The main disadvantage of linked list is that it can't be searched randomly. It must be traversed from the first place. The search efficiency is relatively low. The time complexity of linked list query is O(n).

performance evaluation

After understanding the basic knowledge of arrays and linked lists, we will officially enter the performance evaluation phase.

Before the official start, let's clarify the test objectives. There are only six points we need to test:

Performance test of adding operation from head / middle part / tail;
A performance test that queries from the head / middle / tail.

Because the addition and deletion operations are basically the same at the execution time level, for example, the addition of array needs to move the elements behind, and the deletion also needs to move the elements behind; the same is true for the linked list. The addition and deletion both change the information of itself and the connected nodes, so we combine the addition and deletion tests into one, and use the addition operation to test.

Test instructions:

In Java language, array is represented by ArrayList, while linked list is represented by LinkedList, so we use these two objects to test;
In this article, we will use the JMH framework officially recommended by Oracle for testing, Click to see more about JMH；
The test environment of this paper is JDK 1.8, MacMini and Idea 2020.1.

1. Head adding performance test

import org.openjdk.jmh.annotations.*;
import org.openjdk.jmh.infra.Blackhole;
import org.openjdk.jmh.runner.Runner;
import org.openjdk.jmh.runner.RunnerException;
import org.openjdk.jmh.runner.options.Options;
import org.openjdk.jmh.runner.options.OptionsBuilder;

import java.util.ArrayList;
import java.util.LinkedList;
import java.util.concurrent.TimeUnit;

@BenchmarkMode(Mode.AverageTime) // Test completion time
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Warmup(iterations = 2, time = 1, timeUnit = TimeUnit.SECONDS) // Number and time of preheating
@Measurement(iterations = 5, time = 5, timeUnit = TimeUnit.SECONDS) // Number and time of tests
@Fork(1) // fork 1 thread
@State(Scope.Thread)
public class ArrayOptimizeTest {

    private static final int maxSize = 1000; // Number of test cycles
    private static final int operationSize = 100; // Number of operations


    private static ArrayList<Integer> arrayList;
    private static LinkedList<Integer> linkedList;

    public static void main(String[] args) throws RunnerException {
        // Start benchmark
        Options opt = new OptionsBuilder()
                .include(ArrayOptimizeTest.class.getSimpleName()) // Test class to import
                .build();
        new Runner(opt).run(); // Perform tests
    }

    @Setup
    public void init() {
        // Start execution event
        arrayList = new ArrayList<Integer>();
        linkedList = new LinkedList<Integer>();
        for (int i = 0; i < maxSize; i++) {
            arrayList.add(i);
            linkedList.add(i);
        }
    }

    @Benchmark
    public void addArrayByFirst(Blackhole blackhole) {
        for (int i = 0; i < +operationSize; i++) {
            arrayList.add(i, i);
        }
        // To avoid JIT ignoring unused result calculations
        blackhole.consume(arrayList);
    }

    @Benchmark
    public void addLinkedByFirst(Blackhole blackhole) {
        for (int i = 0; i < +operationSize; i++) {
            linkedList.add(i, i);
        }
        // To avoid JIT ignoring unused result calculations
        blackhole.consume(linkedList);
    }
}

As can be seen from the above code, before the test, we first initialize the ArrayList and LinkedList data, and then add 100 elements from the header. The execution results are as follows:

From the above results, we can see that the average execution (completion) time of LinkedList is about 216 times faster than that of ArrayList.

2. Add performance test in the middle

import org.openjdk.jmh.annotations.*;
import org.openjdk.jmh.infra.Blackhole;
import org.openjdk.jmh.runner.Runner;
import org.openjdk.jmh.runner.RunnerException;
import org.openjdk.jmh.runner.options.Options;
import org.openjdk.jmh.runner.options.OptionsBuilder;

import java.util.ArrayList;
import java.util.LinkedList;
import java.util.concurrent.TimeUnit;

@BenchmarkMode(Mode.AverageTime) // Test completion time
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Warmup(iterations = 2, time = 1, timeUnit = TimeUnit.SECONDS) // Number and time of preheating
@Measurement(iterations = 5, time = 5, timeUnit = TimeUnit.SECONDS) // Number and time of tests
@Fork(1) // fork 1 thread
@State(Scope.Thread)
public class ArrayOptimizeTest {
    private static final int maxSize = 1000; // Number of test cycles
    private static final int operationSize = 100; // Number of operations

    private static ArrayList<Integer> arrayList;
    private static LinkedList<Integer> linkedList;

    public static void main(String[] args) throws RunnerException {
        // Start benchmark
        Options opt = new OptionsBuilder()
                .include(ArrayOptimizeTest.class.getSimpleName()) // Test class to import
                .build();
        new Runner(opt).run(); // Perform tests
    }

    @Setup
    public void init() {
        // Start execution event
        arrayList = new ArrayList<Integer>();
        linkedList = new LinkedList<Integer>();
        for (int i = 0; i < maxSize; i++) {
            arrayList.add(i);
            linkedList.add(i);
        }
    }
    
    @Benchmark
    public void addArrayByMiddle(Blackhole blackhole) {
        int startCount = maxSize / 2; // Calculate middle position
        // Insert middle section
        for (int i = startCount; i < (startCount + operationSize); i++) {
            arrayList.add(i, i);
        }
        // To avoid JIT ignoring unused result calculations
        blackhole.consume(arrayList);
    }

    @Benchmark
    public void addLinkedByMiddle(Blackhole blackhole) {
        int startCount = maxSize / 2; // Calculate middle position
        // Insert middle section
        for (int i = startCount; i < (startCount + operationSize); i++) {
            linkedList.add(i, i);
        }
        // To avoid JIT ignoring unused result calculations
        blackhole.consume(linkedList);
    }
}

As can be seen from the above code, before testing, we first initialize the ArrayList and LinkedList data, and then add 100 elements from the middle. The execution results are as follows:

From the above results, we can see that the average execution time of LinkedList is about 54 times faster than that of ArrayList.