Analysis: what "algorithms" are used in the "sorting" function of java

Posted by ScotDiddle on Wed, 23 Feb 2022 09:40:28 +0100

I mean, Ho said

When sorting data a few days ago

Slide your hand into the arrays The source code of sort () method

In line with the mentality of "if you come, you will be at ease"

Just watch it, buddy

I didn't expect to have a new harvest

original

Arrays. The sort () method will use different "sorting algorithms" according to different situations

Next, I'll give you a detailed report on the specific situation

About arrays sort()

Let's popularize science for unfamiliar brothers first

There are two main sorting tool classes provided by jdk:

java.util.Arrays
java.util.Collections

You may use it more often

Collections.sort() + override compare method

Sort based on an attribute

But look at collections Sort () source code, you will find,

Finally, you call arrays Sort () method:

default void sort(Comparator<? super E> c) {
    Object[] a = this.toArray();
    Arrays.sort(a, (Comparator) c);
    ListIterator<E> i = this.listIterator();
    for (Object e : a) {
        i.next();
        i.set((E) e);
    }
}

So arrays Sort () is the final processing entry of sorting. It is necessary to understand it!

Common sorting algorithms

Let's briefly list the English descriptions of commonly used sorting algorithms for reference in the following source code:

  • Bubble Sort
  • Insertion Sort
  • Merge Sort
  • Quick sort

Source code analysis

Overview of arrays Sort() all overloaded methods

We can discuss the specific sorting algorithms from the perspective of "data type of the sorted object"

1

Basic data type

Take int type as an example

(logic of other basic data types is the same)

01 -> Arrays.sort()

There is only one line of code in the method. In English, it is not difficult to guess by comparing the algorithms listed just now,

This is roughly a fast scheduling algorithm. Let's click in to have a look:

public static void sort(int[] a) {
    //Quick sort
    DualPivotQuicksort.sort(a, 0, a.length - 1, null, 0, 0);
}

02 -> DualFivotQuicksort.sort()

This method has two branches, and the direction is determined by the length of the array

According to two notes

We can draw a temporary conclusion by comparing the algorithms just listed:

If the array length is greater than 286, merge sort is used

If it is less than 286, use fast exhaust

static void sort(int[] a, int left, int right,
                 int[] work, int workBase, int workLen) {
    // Use Quicksort on small arrays
    if (right - left < QUICKSORT_THRESHOLD) {
        sort(a, left, right, true);
        return;
    }
    ...
    // Merging
    ...
}
//QUICKSORT_THRESHOLD
private static final int QUICKSORT_THRESHOLD = 286;

But is that really the case?

We enter the sort method of fast platoon

03 -> sort(a, left, right, true)

There are two more branches in this method

private static void sort(int[] a, int left, int right, boolean leftmost) {
    int length = right - left + 1;
    // Use insertion sort on tiny arrays
    if (length < INSERTION_SORT_THRESHOLD) {
        ...
    }
    ...
}
//INSERTION_SORT_THRESHOLD
private static final int INSERTION_SORT_THRESHOLD = 47;

According to the notes, and then refer to the English description of the algorithm listed above

Let's revise the conclusion just now:

When the array length is less than 47, insert sort is used

Only when it is greater than 47 and less than 286 can fast exhaust be really used

So in fact, the fast platoon method is not just fast platoon

Conclusion summary

Sorting for basic data types

The specific sorting algorithm depends on the number of elements

< 47 insert sort

>47 and < 286 fast exhaust

>286. Merge sort

generic paradigm

This should be what we often use,

That is, through collections Sort() call

Let's analyze it step by step:

01 -> Collections.sort()

Two overload methods:
//Input parameter list < T > List
public static <T extends Comparable<? super T>> void sort(List<T> list) {
    list.sort(null);
}

/**
 * Enter reference
 * List<T> list
 * Comparator with comparator <? super T> c
 */
public static <T> void sort(List<T> list, Comparator<? super T> c) {
    list.sort(c);
}

02 -> List.sort()

First convert the object to an array

Then, on the third line, call arrays sort()

default void sort(Comparator<? super E> c) {
    Object[] a = this.toArray();
    Arrays.sort(a, (Comparator) c);
    ListIterator<E> i = this.listIterator();
    for (Object e : a) {
        i.next();
        i.set((E) e);
    }
}

03 -> Arrays.sort(a, (Comparator) c)

The sorting method for generics has two major branches, corresponding to collections Two overloaded methods of sort():

public static <T> void sort(T[] a, Comparator<? super T> c) {
    //Call the default collections The sort (list < T > list) method takes the if branch
    if (c == null) {
        sort(a);
    } 
    //Call collections with selector The sort (list < T > list, comparator <? Super T > C) method takes the else branch
    else {
        //LegacyMergeSort.userRequested defaults to false
        if (LegacyMergeSort.userRequested)
            legacyMergeSort(a, c);
        else
            //Default here
            TimSort.sort(a, 0, a.length, c, null, 0, 0);
    }
}

Through my notes,

We can also draw a conclusion temporarily:

When we call with selector Collections.sort()method,
Two algorithms may be implemented to merge and sort TimSort,

But because legacymergesort userRequested

The default is false,

So it will eventually be implemented TimSort Sorting algorithm.
welfare " Serve " Unknowingly, it's the classic again"Serve"The brothers are ready! Today's Cuisine: 800 large factory interview questions, 1000 basic interview questions, scan code and send numbers "7"You don't have to put together any more interview questions. It's so cool that you can fly!

About this TimSort, I took a brief look at the source code,

I know that brothers may not want to read a lot of code, so I only list the key parts here:

static <T> void sort(T[] a, int lo, int hi, Comparator<? super T> c,
                     T[] work, int workBase, int workLen) {
    .....
    // If array is small, do a "mini-TimSort" with no merges
    if (nRemaining < MIN_MERGE) {
        int initRunLen = countRunAndMakeAscending(a, lo, hi, c);
        binarySort(a, lo, hi, lo + initRunLen, c);
        return;
    }
    .....
    // Merge all remaining runs to complete sort
    assert lo == hi;
    ts.mergeForceCollapse();
    assert ts.stackSize == 1;
}

There are still two branches, which can be roughly seen from the notes: 1. For small arrays, an algorithm called "mini timsort" will be executed. I enter the binarySort() method and find that it is actually an insertion sort. 2. Otherwise, it is merge sort So brothers can understand this:

TimSort Is based on merge sort + Insert sorting optimization algorithm!

The above is the conclusion based on else branch. Next, let's continue to explore the logic of if branch:

c == null -> sort(a)
The logic here is also very clear. There are two cases:
true: Merge sort
false(default): ComparableTimSort
public static void sort(Object[] a) {
    if (LegacyMergeSort.userRequested)
        legacyMergeSort(a);
    else
        //Default here
        ComparableTimSort.sort(a, 0, a.length, null, 0, 0);
}

Comparable Timsort and timport here

The only difference is that the former does not require a custom comparator.

There are many branches of generic sorting. Let's rearrange the logic:
01.Collections.sort(List<T> list)
legacyMergeSort Merge sort
TimSort ((default)
02.With comparator Collections.sort(List<T> list,Comparator<? super T> c)
legacyMergeSort Merge sort
ComparableTimSort((default)

Did the brothers find out

Merging and sorting this branch seems useless

TimSort is used by default.

And it is true that in legacyMergeSort There are already notes in the method, which is probably translated"May be removed in a later version",So this TimSort It should be a better solution than merging and sorting!

About arrays We have a general understanding of the sorting algorithm used by sort ()

Arrays.sort() can be divided into two sorting logics according to the data type of the sorted data:

Basic data type

The specific sorting algorithm depends on the number of elements

< 47 insert sort

>47 and · < 286 fast exhaust

>286. Merge sort

01.Collections.sort(List<T> list)
legacyMergeSort Merge sort
TimSort (Default): Merge + insert
02.With comparator Collections.sort(List<T> list,Comparator<? super T> c)
legacyMergeSort Merge sort

ComparableTimSort (default): merge + insert