为什么 Collections.sort 使用归并排序而不是快速排序? [英] Why Collections.sort uses merge sort instead of quicksort?

查看:18
本文介绍了为什么 Collections.sort 使用归并排序而不是快速排序?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们知道快速排序是最快的排序算法.

We know that quick sort is the fastest sorting algorithm.

JDK6 collections.sort 使用归并排序算法而不是快速排序.但是 Arrays.sort 使用快速排序算法.

The JDK6 collections.sort uses the merge sort algorithm instead of quick sort. But Arrays.sort uses quick sort algorithm.

Collections.sort 使用归并排序而不是快速排序的原因是什么?

What is the reason Collections.sort uses merge sort instead of quick sort?

推荐答案

极有可能来自 Josh Bloch §:

Highly likely from Josh Bloch §:

我确实写了这些方法,所以我想我有资格回答.它是确实没有单一的最佳排序算法.快速排序有与归并排序相比,有两个主要缺陷:

I did write these methods, so I suppose I'm qualified to answer. It is true that there is no single best sorting algorithm. QuickSort has two major deficiencies when compared to mergesort:

  1. 它不稳定(正如 parsifal 指出的那样).

  1. It's not stable (as parsifal noted).

它不保证 n log n 性能;它可以在病理输入上降级为二次性能.

It doesn't guarantee n log n performance; it can degrade to quadratic performance on pathological inputs.

稳定性对于原始类型来说不是问题,因为没有身份与(价值)平等不同.并且有可能二次行为在实践中被认为不是问题Bentely 和 McIlroy 的实现(或随后用于 Dual PivotQuicksort),这就是为什么这些 QuickSort 变体用于原始排序.

Stability is a non-issue for primitive types, as there is no notion of identity as distinct from (value) equality. And the possibility of quadratic behavior was deemed not to be a problem in practice for Bentely and McIlroy's implementation (or subsequently for Dual Pivot Quicksort), which is why these QuickSort variants were used for the primitive sorts.

在对任意对象进行排序时,稳定性很重要.例如,假设您有代表电子邮件消息的对象,并且您对他们首先按日期,然后按发件人.您希望它们按以下顺序排序每个发件人中的日期,但只有在排序为稳定的.这就是我们选择提供稳定排序(Merge Sort)的原因对对象引用进行排序.(从技术上讲,多个顺序稳定的排序导致在键的字典顺序排序的倒序:最后的排序决定了最多的重要的子键.)

Stability is a big deal when sorting arbitrary objects. For example, suppose you have objects representing email messages, and you sort them first by date, then by sender. You expect them to be sorted by date within each sender, but that will only be true if the sort is stable. That's why we elected to provide a stable sort (Merge Sort) to sort object references. (Techincally speaking, multiple sequential stable sorts result in a lexicographic ordering on the keys in the reverse order of the sorts: the final sort determines the most significant subkey.)

Merge Sort 保证 n log n(时间)是一个很好的附带好处无论输入如何,性能.当然也有缺点:快速排序是一种就地"排序:它只需要记录 n 个外部空间(维护调用堆栈).另一方面,合并,排序,需要 O(n) 外部空间.TimSort 变体(在 Java 中引入)SE 6) 如果输入数组是几乎排序.

It's a nice side benefit that Merge Sort guarantees n log n (time) performance no matter what the input. Of course there is a down side: quick sort is an "in place" sort: it requies only log n external space (to maintain the call stack). Merge, sort, on the other hand, requires O(n) external space. The TimSort variant (introduced in Java SE 6) requires substantially less space (O(k)) if the input array is nearly sorted.

此外,以下是相关的:

java.util.Arrays.sort 和(间接)使用的算法java.util.Collections.sort 对对象引用进行排序是一个修改归并排序(其中如果最高元素在低子列表小于高子列表中的最低元素)."它是一个相当快的稳定排序,保证 O(n log n)性能并需要 O(n) 额外空间.在它的日子(它被写成1997 年由 Joshua Bloch 撰写),这是一个不错的选择,但今天但我们可以做得更好.

The algorithm used by java.util.Arrays.sort and (indirectly) by java.util.Collections.sort to sort object references is a "modified mergesort (in which the merge is omitted if the highest element in the low sublist is less than the lowest element in the high sublist)." It is a reasonably fast stable sort that guarantees O(n log n) performance and requires O(n) extra space. In its day (it was written in 1997 by Joshua Bloch), it was a fine choice, but today but we can do much better.

自 2003 年以来,Python 的列表排序使用了一种称为 timsort 的算法(以编写它的蒂姆·彼得斯(Tim Peters)的名字命名).它是一个稳定的、自适应的、迭代的需要远少于 n 个 log(n) 比较的归并排序,当在部分排序的数组上运行,同时提供性能在随机数组上运行时与传统的归并排序相当.喜欢所有适当的归并排序 timsort 都是稳定的,并且在 O(n log n) 时间内运行(最差的情况).在最坏的情况下,timsort 需要临时存储n/2 个对象引用的空间;在最好的情况下,它只需要一个小的恒定空间.对比一下现在的实现,它总是需要 n 个对象的额外空间引用,并且仅在几乎排序的列表中击败 n log n.

Since 2003, Python's list sort has used an algorithm known as timsort (after Tim Peters, who wrote it). It is a stable, adaptive, iterative mergesort that requires far fewer than n log(n) comparisons when running on partially sorted arrays, while offering performance comparable to a traditional mergesort when run on random arrays. Like all proper mergesorts timsort is stable and runs in O(n log n) time (worst case). In the worst case, timsort requires temporary storage space for n/2 object references; in the best case, it requires only a small constant amount of space. Contrast this with the current implementation, which always requires extra space for n object references, and beats n log n only on nearly sorted lists.

Timsort 在这里有详细描述:http://svn.python.org/projects/python/trunk/Objects/listsort.txt.

Timsort is described in detail here: http://svn.python.org/projects/python/trunk/Objects/listsort.txt.

Tim Peters 的原始实现是用 C. Joshua Bloch 编写的将它从 C 移植到 Java 并最终测试、基准测试和调整结果代码广泛.生成的代码是一个drop-in替换 java.util.Arrays.sort.在高度有序的数据上,这代码的运行速度最高可达当前实现的 25 倍(在HotSpot 服务器虚拟机).在随机数据上,新旧数据的速度实现具有可比性.对于非常短的列表,新的即使在随机情况下,实现也比旧的快得多数据(因为它避免了不必要的数据复制).

Tim Peters's original implementation is written in C. Joshua Bloch ported it from C to Java and end tested, benchmarked, and tuned the resulting code extensively. The resulting code is a drop-in replacement for java.util.Arrays.sort. On highly ordered data, this code can run up to 25 times as fast as the current implementation (on the HotSpot server VM). On random data, the speeds of the old and new implementations are comparable. For very short lists, the new implementation is substantially faster that the old even on random data (because it avoids unnecessary data copying).

另外,请参见 Java 7 是否使用Tim Sort for the Method Arrays.Sort?.

没有单一的最佳"选择.与许多其他事情一样,这也是权衡.

There isn't a single "best" choice. As with many other things, it's about tradeoffs.

这篇关于为什么 Collections.sort 使用归并排序而不是快速排序?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆