jdk默认排序算法分析

注意,后续分析都是基于JDK1.8.

Arrays.sort:

public static <T> void sort(T[] a, Comparator<? super T> c) {
        if (c == null) {
            sort(a);
        } else {
            if (LegacyMergeSort.userRequested)
                legacyMergeSort(a, c);
            else
                TimSort.sort(a, 0, a.length, c, null, 0, 0);
        }
    }

可以看到,具体排序算法会根据userRequested的取值区分,如果用户选择需要要,则使用传统归并排序,否则会使用优化的Tim Peters's list sort算法(一种优化的归并排序),这个算法是从JDK1.7加入的,这里要注意一下:

Old merge sort implementation can be selected (for compatibility with broken comparators) using a system property.

什么叫broken comparators? 新算法需要提供一个完整的比较器,即大于、小于、等于这三种情况都要给出判断,有时我们的比较器没有提供等于的判断,那么使用这个算法就会抛异常。

同时,如果是基础类型,排序算法是不一样的:

/**
 * Sorts the specified array into ascending numerical order.
 *
 * <p>Implementation note: The sorting algorithm is a Dual-Pivot Quicksort
 * by Vladimir Yaroslavskiy, Jon Bentley, and Joshua Bloch. This algorithm
 * offers O(n log(n)) performance on many data sets that cause other
 * quicksorts to degrade to quadratic performance, and is typically
 * faster than traditional (one-pivot) Quicksort implementations.
 *
 * @param a the array to be sorted
 */
public static void sort(long[] a) {
        DualPivotQuicksort.sort(a, 0, a.length - 1, null, 0, 0);
    }

两点要注意,1,使用的不是传统的快速排序,二是改进的双轴快速排序,性能更好;2.对于某些输入,算法时间复杂度会退化,我们知道,快速排序最坏时间复杂度O(n2)。

为什么对于对象类型的排序,不使用快速排序呢?
JDK作者Josh Bloch说明过这个问题:

I did write these methods, so I suppose I’m qualified to answer. It is true that there is no single best sorting algorithm. QuickSort has two major deficiencies when compared to mergesort:
It’s not stable (as parsifal noted).
It doesn’t guarantee n log n performance; it can degrade to quadratic performance on pathological inputs.
Stability is a non-issue for primitive types, as there is no notion of identity as distinct from (value) equality. And the possibility of quadratic behavior was deemed not to be a problem in practice for Bentely and McIlroy’s implementation (or subsequently for Dual Pivot Quicksort), which is why these QuickSort variants were used for the primitive sorts.

Stability is a big deal when sorting arbitrary objects. For example, suppose you have objects representing email messages, and you sort them first by date, then by sender. You expect them to be sorted by date within each sender, but that will only be true if the sort is stable. That’s why we elected to provide a stable sort (Merge Sort) to sort object references. (Techincally speaking, multiple sequential stable sorts result in a lexicographic ordering on the keys in the reverse order of the sorts: the final sort determines the most significant subkey.)
It’s a nice side benefit that Merge Sort guarantees n log n (time) performance no matter what the input. Of course there is a down side: quick sort is an “in place” sort: it requies only log n external space (to maintain the call stack). Merge, sort, on the other hand, requires O(n) external space. The TimSort variant (introduced in Java SE 6) requires substantially less space (O(k)) if the input array is nearly sorted.

原因有两个,一是稳定性,快速排序是不稳定的,对于基础数值类型,没有问题,如果是对象,需要通过多个属性多级排序,那么不稳定的算法就有可能出问题。二是从复杂度看,虽然平均时间复杂度都是O(nlogn),但快速排序可能退化,虽然快速排序空间复杂度更低,但从通用性考虑,还是选择归并排序。

如果觉得我的文章对您有用,请在支付宝公益平台找个项目捐点钱。 @sxzhou Jan 3, 2018

奉献爱心