05 bucket sort, count sort and cardinality sort of sorting operation

Posted by Misticx on Fri, 18 Feb 2022 14:48:23 +0100

Because the time complexity of bucket sorting, counting sorting and cardinal sorting is o(n) linear, these three sorting methods belong to linear sorting. The principles of these three algorithms are not difficult, and the analysis of time complexity and space complexity is relatively simple, so we will focus on the applicable scenarios of the three algorithms.

1. Bucket sorting

The idea of bucket sorting is to divide the data to be sorted into several ordered buckets, sort the data in each bucket by itself, and then take out the data in each bucket in order. The sequence is ordered.

Time complexity analysis: if there are n data to be sorted, we divide them evenly into m buckets, and there are k=n/m elements in each bucket. The time complexity of bucket sorting ultimately depends on the selection of the sorting method of elements in the bucket (1) merge sorting or fast sorting is used inside the bucket: the time complexity of each bucket is O(k * logk). The time complexity of M bucket sorting is O(m * k * logk). Because k=n/m, the time complexity of the whole bucket sorting is O(n*log(n/m)). When the number of buckets m is close to the number of data n, log(n/m) is a very small constant. At this time, the time complexity of bucket sorting is close to O(n). (2) Insert sorting (or bubbling and selection sorting) is used inside the bucket: the time complexity of each bucket is O(k ^2). The time complexity of sorting m buckets is O(m * k ^2). Because k=n/m, the time complexity of sorting the whole bucket is O(n^2/m)). When the number of buckets m is close to the number of data n, n/m ， is a constant. At this time, the time complexity of bucket sorting is also close to O(n).
Bucket sorting requires the use of additional buckets with a spatial complexity of o(n)
The stability of bucket sorting depends on the sorting algorithm used in the bucket.

The use of bucket sorting needs to meet the following requirements:

The data can be easily divided into m buckets, and there is a natural size order between buckets (so that the data in the bucket does not need to be sorted after sorting the data in the bucket)
The data is evenly distributed among buckets. If there are many buckets and few buckets after division, the performance degradation of the algorithm will be very obvious. In extreme cases, if all data are divided into one bucket, the time complexity will degrade to o(nlogn) or o(n^2)

class Solution{

    private static final InsertSort insertSort = new InsertSort();
    public int[] sort(int[]a){
        return bucketSort(a, 5);
    };

    public int[] bucketSort(int[]a, int bucketSize){
        if(a.length == 0){
            return a;
        }
        minValue = a[0];
        maxValue = a[0];
        //Find the minimum and maximum values in the array
        for(int value:a){
            if(minValue>value){
                minvalue = value;
            }else if(maxValue<value){
                maxValue = value;
            }
        }
        // Generate bucketNum buckets
        int bucketNum = (int)Math.floor((maxValue - minValue)/bucketSize);
        int[][] buckets = new int[bucketNum][0];

        //Add data to bucket
        for(int i=0; i<a.length; i++){
            int index = (int)Math.floor((a[i]-minValue)/bucketSize);
            //Buckets is a two-dimensional array, and the corresponding buckets[i] is a one-dimensional array. The function of arrAppend function is to dynamically expand the capacity of one-dimensional array
            buckets[index] = arrAppend(buckets[index], a[i]);
        }
        //Complete the final sorting
        int arrCount = 0;
        for(int[] bucket: buckets){
            if(bucket.length == 0){
                continue;
            }
            bucket = insertSort.sort(bucket); //Insert elements in sorting bucket
            for(int k=0; k<bucket.length; k++){
                a[arrCount++] = bucket[k];
            }
        }
    
    };
    //Realize the dynamic expansion of the array
    public int[] arrAppend(int[]a, int value){
        int[] arr = Arrays.copyOf(a, a.length+1);
        arr[arr.length-1] = value;         
    }
}

2. Counting and sorting

Counting sorting is actually a special case of bucket sorting. When the data range value to be sorted is small, such as when the maximum value is k, all values can be divided into k buckets, and the data in each bucket is the same, which saves the time of data sorting in the bucket.

Time complexity: the time complexity of counting and sorting is o(n)
Counting sorting needs to use additional temp array to save data, so it is not in-situ sorting, and its space complexity is o(n)
The counting sort is stable

The use of counting and sorting needs to meet the following requirements:

Counting sort is applicable when the value of k is small. If the value of k is much larger than the length of the array, it is not suitable to use counting sort
Count sorting can only sort non negative integers. If the array to be sorted stores other types of data, you need to convert the data into non negative integers without changing its relative size.

class Solution{
    public void countSort(int[]a, int n){
        if(n<=0) return;

        int maxValue = a[0];
        //Traverse to find the maximum value of the array
        for(int value:a){
            if(value > maxValue){
                maxValue = value;
            }
        }
        int[] c = new int[maxValue+1];   //Get the count array of each value
        for(int i=0; i<=max; i++){       //Assign initial value to count array
            c[i] = 0;
        }
        for(int value:a){
            c[value] ++;         //Get the number of numbers corresponding to each subscript of the count array
        } 
        for(int i=1; i<=max; i++){
            c[i] = c[i] + c[-1];  //Find the cumulative number array
        }
        
        int[] temp = new int[n];  //Generate an array with the same size as the array to be sorted, which is used to store the ordered array values
        //Complete the sorting process of the array
        for(int i=n-1; i>0; i--){
            temp[c[a[i]]-1] = a[i];
            c[a[i]]--;
        }
        //Copy the ordered array values to the original array, and the whole process ends
        for(int i=0; i<n; i++){
            a[i]=temp[i];
        }
    }
}

3. Cardinality sorting

In order to ensure the stability of the algorithm, cardinality sorting is generally arranged from the back to the front according to the number of digits

The time complexity of Radix sorting mainly depends on the time complexity of the linear sorting algorithm used in sorting each bit of data. In a case, we believe that the time complexity of Radix sorting is approximately o(n)
In cardinal sort, the linear sort to be used for each bit is not in-situ sort, so cardinal sort is not in-situ sort, and its spatial complexity is o(n)
Cardinality sorting is stable

When sorting, use cardinality:

Independent "bits" can be divided for comparison, and there is a progressive relationship between bits. If the high bit of a data is larger than that of b data, the remaining low bits need not be compared.
The data range of each bit should not be too large. It should be sorted by linear sorting algorithm. Otherwise, the time complexity of Radix sorting cannot be O(n)

Topics: Java Algorithm data structure

Programmer Think

05 bucket sort, count sort and cardinality sort of sorting operation

1. Bucket sorting

2. Counting and sorting

3. Cardinality sorting

Hot Topics