Analysis: what algorithms are used in the "sorting" function of java“

Posted by developerdave on Wed, 12 Jan 2022 13:04:21 +0100

hello brothers

I mean, Ho said

When sorting the data a few days ago

Slide your hand into the arrays In the source code of the sort () method

In line with the mentality of "settle down if you come"

Just watch it, buddy

I didn't expect to have a new harvest

original

Arrays. The sort () method uses different sorting algorithms according to different situations

Next, I'll give you a detailed report on the specific situation

About arrays sort()

Let's popularize science to unfamiliar brothers first

There are two main sorting tool classes provided by jdk:

java.util.Arrays
java.util.Collections

You may use it more often

Collections.sort() + override compare method

Sort based on an attribute

But look at collections Sort () source code, you will find,

Finally, you call arrays Sort () method:

default void sort(Comparator<? super E> c) {
    Object[] a = this.toArray();
    Arrays.sort(a, (Comparator) c);
    ListIterator<E> i = this.listIterator();
    for (Object e : a) {
        i.next();
        i.set((E) e);
    }
}

So arrays Sort () is the final processing entry for sorting. It is necessary to understand it!

Common sorting algorithms

Let's briefly list the English descriptions of commonly used sorting algorithms for reference in the following source code:

  • Bubble Sort
  • Insertion Sort
  • Merge Sort
  • Quick sort

Source code analysis

Overview of arrays Sort() all overloaded methods

We can discuss the specific sorting algorithms from the perspective of "data type of sorted object"

 
1 | basic data type

Take int type as an example

(logic of other basic data types is the same)

01 -> Arrays.sort()

There is only one line of code in the method. In English, it is not difficult to guess by comparing the algorithms listed just now,

This is roughly a fast scheduling algorithm. Let's click in to see:

public static void sort(int[] a) {
    //Quick sort
    DualPivotQuicksort.sort(a, 0, a.length - 1, null, 0, 0);
}

02 -> DualFivotQuicksort.sort()

The method has two branches, and the direction is determined by the length of the array

According to two notes

We can temporarily draw a conclusion by comparing the algorithms listed just now:

Array length greater than 286, use merge sort

Less than 286, use fast exhaust

static void sort(int[] a, int left, int right,
                 int[] work, int workBase, int workLen) {
    // Use Quicksort on small arrays
    if (right - left < QUICKSORT_THRESHOLD) {
        sort(a, left, right, true);
        return;
    }
    ...
    // Merging
    ...
}
//QUICKSORT_THRESHOLD
private static final int QUICKSORT_THRESHOLD = 286;

But is that really the case?

We enter the sort method of fast platoon

03 -> sort(a, left, right, true)

There are two more branches in this method

private static void sort(int[] a, int left, int right, boolean leftmost) {
    int length = right - left + 1;
    // Use insertion sort on tiny arrays
    if (length < INSERTION_SORT_THRESHOLD) {
        ...
    }
    ...
}
//INSERTION_SORT_THRESHOLD
private static final int INSERTION_SORT_THRESHOLD = 47;

According to the notes, and then refer to the English description of the algorithm listed above

Let's revise the conclusion just now:

When the array length is less than 47, insert sort is used

Only when it is greater than 47 and less than 286 can the fast exhaust be really used

So in fact, the fast platoon method is not just fast platoon

Conclusion summary

Sorting for basic data types

The specific sorting algorithm depends on the number of elements

< 47 insert sort

>47 and < 286 fast exhaust

>286 merge sort

 
2 | generic

This should be what we often use,

That is, through collections Sort() call

Let's analyze it step by step:

01 -> Collections.sort()

Two overloaded methods://Input parameter list < T > List

public static <T extends Comparable<? super T>> void sort(List<T> list) {

    list.sort(null);

}




/**

 * Input parameter

 * List<T> list

 * Comparator with comparator <? super T> c

 */

public static <T> void sort(List<T> list, Comparator<? super T> c) {

    list.sort(c);

}

02 -> List.sort()

First convert the object to an array

Then, on the third line, call arrays sort()

default void sort(Comparator<? super E> c) {
    Object[] a = this.toArray();
    Arrays.sort(a, (Comparator) c);
    ListIterator<E> i = this.listIterator();
    for (Object e : a) {
        i.next();
        i.set((E) e);
    }
}

03 -> Arrays.sort(a, (Comparator) c)

The sorting method for generics has two large branches, corresponding to collections Two overloaded methods of sort():

public static <T> void sort(T[] a, Comparator<? super T> c) {
    //Call the default collections The sort (list < T > list) method takes the if branch
    if (c == null) {
        sort(a);
    } 
    //Call collections. With selector The sort (list < T > list, comparator <? Super T > C) method takes the else branch
    else {
        //LegacyMergeSort.userRequested defaults to false
        if (LegacyMergeSort.userRequested)
            legacyMergeSort(a, c);
        else
            //Default here
            TimSort.sort(a, 0, a.length, c, null, 0, 0);
    }
}

Through my notes,

We can also draw a conclusion temporarily:

When we call collections with selectors Sort() method,

Two algorithms may be implemented: merge sort, TimSort,

But because
LegacyMergeSort.userRequested

The default is false,

Therefore, the TimSort sorting algorithm will eventually be executed.

 

Welfare "serving"

Unknowingly, the classic "serving" link came again

Guys, are you ready!

Today's Cuisine:

800 large factory interview questions

1000 basic interview questions

The official account "Hao says programming" sends the number "7".

You don't have to put together any more interview questions. It's cool to fly!

 

About this TimSort, I took a brief look at the source code,

I know that brothers may not want to read a lot of code, so I only list the key parts here:

static <T> void sort(T[] a, int lo, int hi, Comparator<? super T> c,

                     T[] work, int workBase, int workLen) {

    .....

    // If array is small, do a "mini-TimSort" with no merges

    if (nRemaining < MIN_MERGE) {

        int initRunLen = countRunAndMakeAscending(a, lo, hi, c);

        binarySort(a, lo, hi, lo + initRunLen, c);

        return;

    }

    .....

    // Merge all remaining runs to complete sort

    assert lo == hi;

    ts.mergeForceCollapse();

    assert ts.stackSize == 1;

}

There are still two branches, which can be roughly seen from the notes:
1. For small arrays, an algorithm called "mini timsort" will be executed. I enter the binarySort() method and find that it is actually an insertion sort.
2. Otherwise, it is merge sort
So brothers can understand this:

TimSort is an optimization algorithm based on merge sort + insert sort!

The above is the conclusion based on else branch. Next, let's continue to discuss the logic of if branch:

c == null -> sort(a)

The logic here is also very clear. There are two cases:
true: Merge sort
false(default): ComparableTimSortpublic static void sort(Object[] a) {

    if (LegacyMergeSort.userRequested)

        legacyMergeSort(a);

    else

        //Default here

        ComparableTimSort.sort(a, 0, a.length, null, 0, 0);

}

Comparable Timsort and timport here

The only difference is that the former does not require a custom comparator.
 

There are many branches of generic sorting. Let's re sort out the logic:

01.Collections.sort(List list)

legacyMergeSort merge sort

TimSort (default)

02. Collections with comparator sort(List list,Comparator<? super T> c)

legacyMergeSort merge sort

ComparableTimSort (default)

Did the brothers find out

Merging and sorting this branch seems useless

TimSort is used by default.

The fact is true. There are comments in the legacyMergeSort method. The translation is probably "it may be removed in a later version", so this TimSort should be a better solution than merge sorting!

About arrays The sorting algorithm used by sort () has been roughly understood

I'll make an overall summary for my brothers. Remember to praise and share!

Arrays.sort() is divided into two sorting logics according to the data type of the sorted data:

01 | basic data type

The specific sorting algorithm depends on the number of elements

< 47 insert sort

>47 and · < 286 fast exhaust

>286 merge sort

02 | generic

01.Collections.sort(List list)

legacyMergeSort merge sort

TimSort (default): merge + insert

02. Collections with comparator sort(List list,Comparator<? super T> c)

legacyMergeSort merge sort

ComparableTimSort (default): merge + insert

Topics: Java Algorithm