哪个更有效:排序流或排序列表? [英] What is more efficient: sorted stream or sorting a list?

查看:26
本文介绍了哪个更有效:排序流或排序列表?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我们在一个集合中有一些项目,我们想使用某个比较器对它们进行排序,期望结果是一个列表:

Assume we have some items in a collection and we want to sort them using certain comparator, expecting result in a list:

Collection<Item> items = ...;
Comparator<Item> itemComparator = ...;

其中一种方法是对列表中的项目进行排序,例如:

One of the approaches is to sort items in a list, something like:

List<Item> sortedItems = new ArrayList<>(items);
Collections.sort(sortedItems, itemComparator);

另一种方法是使用排序流:

Anothe approach is using a sorted stream:

List<Item> sortedItems = items
    .stream()
    .sorted(itemComparator)
    .collect(Collectors.toList());

我想知道哪种方法更有效?排序流是否有任何优势(例如多核上的快速排序)?

I wonder, which approach is more efficient? Are there any advantages of a sorted stream (like faste sorting on multiple cores)?

在运行时复杂性/最快的意义上高效.

Efficient in a sense of runtime complexity/fastest.

我不相信自己会实现完美的benchmark 和学习 SortedOps 并没有真正启发我.

I don't trust myself to implement a perfect benchmark and studying SortedOps did not really enlighten me.

推荐答案

可以肯定地说,两种形式的排序将具有相同的复杂性……即使不看代码.(如果他们不这样做,那么一种形式将被严重破坏!)

It is safe to say that two forms of sort will have the same complexity ... even without looking at the code. (If they didn't then one form would be severely broken!)

查看流的 Java 8 源代码(特别是内部类 java.util.stream.SortedOps),sorted() 方法将组件添加到流将所有流元素捕获到数组或 ArrayList 中的管道.

Looking at Java 8 source code for streams (specifically the internal class java.util.stream.SortedOps), the sorted() method adds a component to a stream pipeline that captures all of the stream elements into either an array or an ArrayList.

  • 当且仅当流水线汇编代码可以提前推断出流中的元素数量时,才使用数组.

  • An array is used if and only if the pipeline assembly code can deduce the number of elements in the stream ahead of time.

否则,使用 ArrayList 来收集要排序的元素.

Otherwise, an ArrayList is used to gather the elements to be sorted.

如果使用了 ArrayList,则会产生构建/增长列表的额外开销.

If an ArrayList is used, you incur the extra overhead of building / growing the list.

然后我们回到代码的两个版本:

Then we return to two versions of the code:

List<Item> sortedItems = new ArrayList<>(items);
Collections.sort(sortedItems, itemComparator);

在这个版本中,ArrayList 构造函数将元素 items 复制到一个适当大小的数组中,然后 Collections.sort 执行一个输入 -放置那种数组.(这发生在幕后).

In this version, the ArrayList constructor copies the elements items to an appropriately sized array, and then Collections.sort does an in-place sort of that array. (This happens under the covers).

List<Item> sortedItems = items
    .stream()
    .sorted(itemComparator)
    .collect(Collectors.toList());

在这个版本中,正如我们在上面看到的,与 sorted() 关联的代码要么构建和排序数组(相当于上面发生的事情)要么构建 ArrayList缓慢的方式.但除此之外,还有将数据从 items 流式传输到收集器的开销.

In this version, as we have seen above, the code associated with sorted() either builds and sorts an array (equivalent to what happens above) or it builds the ArrayList the slow way. But on top of that, there are the overheads of stream the data from items and to the collector.

总体(至少是 Java 8 实现)代码检查告诉我,代码的第一个版本不能比第二个版本慢,并且在大多数(如果不是全部)情况下它会更快.但是随着列表变大,O(NlogN) 排序将倾向于支配复制的 O(N) 开销.这将意味着两个版本之间的相对差异会越来越小.

Overall (with the Java 8 implementation at least) code examination tells me that first version of the code cannot be slower than the second version, and in most (if not all) cases it will be faster. But as the list gets larger, the O(NlogN) sorting will tend to dominate the O(N) overheads of copying. That will mean that the relative difference between the two versions will get smaller.

如果您真的很在意,您应该编写一个基准测试来测试与特定 Java 实现和特定输入数据集的实际差异.(或调整@Eugene 的基准!)

If you really care, you should write a benchmark to test the actual difference with a specific implementation of Java, and a specific input dataset. (Or adapt @Eugene's benchmark!)

这篇关于哪个更有效:排序流或排序列表?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆