使用无序终端操作的Stream.skip行为 [英] Stream.skip behavior with unordered terminal operation

查看:137
本文介绍了使用无序终端操作的Stream.skip行为的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经阅读了这个这个问题,但仍然怀疑JDK作者是否打算观察 Stream.skip 的行为。

I've already read this and this questions, but still doubt whether the observed behavior of Stream.skip was intended by JDK authors.

让我们简单输入数字1..20:

Let's have simple input of numbers 1..20:

List<Integer> input = IntStream.rangeClosed(1, 20).boxed().collect(Collectors.toList());

现在让我们创建一个并行流,结合无序() skip()以不同方式收集结果:

Now let's create a parallel stream, combine the unordered() with skip() in different ways and collect the result:

System.out.println("skip-skip-unordered-toList: "
        + input.parallelStream().filter(x -> x > 0)
            .skip(1)
            .skip(1)
            .unordered()
            .collect(Collectors.toList()));
System.out.println("skip-unordered-skip-toList: "
        + input.parallelStream().filter(x -> x > 0)
            .skip(1)
            .unordered()
            .skip(1)
            .collect(Collectors.toList()));
System.out.println("unordered-skip-skip-toList: "
        + input.parallelStream().filter(x -> x > 0)
            .unordered()
            .skip(1)
            .skip(1)
            .collect(Collectors.toList()));

过滤步骤在这里基本没什么,但为流引擎增加了更多的难度:现在它不知道输出的确切大小,因此关闭了一些优化。我有以下结果:

Filtering step does essentially nothing here, but adds more difficulty for stream engine: now it does not know the exact size of the output, thus some optimizations are turned off. I have the following results:

skip-skip-unordered-toList: [3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]
// absent values: 1, 2
skip-unordered-skip-toList: [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 16, 17, 18, 19, 20]
// absent values: 1, 15
unordered-skip-skip-toList: [1, 2, 3, 4, 5, 6, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 19, 20]
// absent values: 7, 18

结果完全正常,一切正常。在第一种情况下,我要求跳过前两个元素,然后收集列表,没有特别的顺序。在第二种情况下,我要求跳过第一个元素,然后变成无序并跳过一个元素(我不关心哪一个)。在第三种情况下,我首先转为无序模式,然后跳过两个任意元素。

The results are completely fine, everything works as expected. In the first case I asked to skip first two elements, then collect to list in no particular order. In the second case I asked to skip the first element, then turn into unordered and skip one more element (I don't care which one). In the third case I turned into unordered mode first, then skip two arbitrary elements.

让我们跳过一个元素并以无序模式收集到自定义集合。我们的自定义集合将是 HashSet

Let's skip one element and collect to the custom collection in unordered mode. Our custom collection will be a HashSet:

System.out.println("skip-toCollection: "
        + input.parallelStream().filter(x -> x > 0)
        .skip(1)
        .unordered()
        .collect(Collectors.toCollection(HashSet::new)));

输出结果令人满意:

skip-toCollection: [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]
// 1 is skipped

所以一般来说我希望只要流是ordered, skip()跳过第一个元素,否则它会跳过任意​​元素。

So in general I expect that as long as stream is ordered, skip() skips the first elements, otherwise it skips arbitrary ones.

但是让我们使用等价的无序元素终端操作收集(Collectors.toSet())

However let's use an equivalent unordered terminal operation collect(Collectors.toSet()):

System.out.println("skip-toSet: "
        + input.parallelStream().filter(x -> x > 0)
            .skip(1)
            .unordered()
            .collect(Collectors.toSet()));

现在输出为:

skip-toSet: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 14, 15, 16, 17, 18, 19, 20]
// 13 is skipped

任何其他无序终端都可以获得相同的结果操作(如 forEach findAny anyMatch 等。 )。在这种情况下删除无序()步骤不会改变任何内容。似乎当 unordered()步骤正确地使流从当前操作开始无序时,无序终端操作使整个流从一开始就开始无序,尽管这会影响到结果如果使用了 skip()。这对我来说似乎完全是误导:我希望使用无序收集器与在终端操作之前将流转换为无序模式并使用等效的有序收集器相同。

The same result can be achieved with any other unordered terminal operation (like forEach, findAny, anyMatch, etc.). Removing unordered() step in this case changes nothing. Seems that while unordered() step correctly makes the stream unordered starting from the current operation, the unordered terminal operation makes the whole stream unordered starting from very beginning despite that this can affect the result if skip() was used. This seems completely misleading for me: I expect that using the unordered collector is the same as turning the stream into unordered mode just before the terminal operation and using the equivalent ordered collector.

所以我的问题是:


  1. 这种行为是打算还是错误?

  2. 如果是,它在某处记录了吗?我已阅读 Stream .skip()文档:它没有说明无序终端操作。另外 Characteristics.UNORDERED 文档不是很理解,也没有说整个流的排序会丢失。最后,订购包装摘要中的部分也不包括此案例。可能我错过了什么?

  3. 如果无意的终端操作意图使整个流无序,为什么 unordered() step make从这一点开始它只是无序的?我能依靠这种行为吗?或者我很幸运,我的第一次测试工作得很好?

  1. Is this behavior intended or it's a bug?
  2. If yes is it documented somewhere? I've read Stream.skip() documentation: it does not say anything about unordered terminal operations. Also Characteristics.UNORDERED documentation is not very comprehend and does not say that ordering will be lost for the whole stream. Finally, Ordering section in package summary does not cover this case either. Probably I'm missing something?
  3. If it's intended that unordered terminal operation makes the whole stream unordered, why unordered() step makes it unordered only since this point? Can I rely on this behavior? Or I was just lucky that my first tests work nicely?


推荐答案

回想一下目标流标志(ORDERED,SORTED,SIZED,DISTINCT)是为了使操作能够避免做不必要的工作。涉及流标志的优化示例如下:

Recall that the goal of stream flags (ORDERED, SORTED, SIZED, DISTINCT) is to enable operations to avoid doing unnecessary work. Examples of optimizations that involve stream flags are:


  • 如果我们知道流已经排序,那么 sorted() 是一个无操作;

  • 如果我们知道流的大小,我们可以在中预先分配一个正确大小的数组toArray(),避免复制;

  • 如果我们知道输入没有有意义的遭遇顺序,我们不需要采取额外的步骤来保留遭遇顺序。

  • If we know the stream is already sorted, then sorted() is a no-op;
  • If we know the size of the stream, we can pre-allocate a correct-sized array in toArray(), avoiding a copy;
  • If we know that the input has no meaningful encounter order, we need not take extra steps to preserve encounter order.

管道的每个阶段都有一组流标志。中间操作可以注入,保留或清除流标志。例如,过滤保留了sorted-ness / distinct-ness但不保留大小;映射保留了大小,但没有排序或不同。排序注入排序。中间操作的标志处理相当简单,因为所有决策都是本地的。

Each stage of a pipeline has a set of stream flags. Intermediate operations can inject, preserve, or clear stream flags. For example, filtering preserves sorted-ness / distinct-ness but not sized-ness; mapping preserves sized-ness but not sorted-ness or distinct-ness. Sorting injects sorted-ness. The treatment of flags for intermediate operations is fairly straightforward, because all decisions are local.

对终端操作的标志的处理更加微妙。 ORDERED是终端操作最相关的标志。如果终端操作是UNORDERED,那么我们会反向传播无序的。

The treatment of flags for terminal operations is more subtle. ORDERED is the most relevant flag for terminal ops. And if a terminal op is UNORDERED, then we do back-propagate the unordered-ness.

为什么我们这样做?好吧,考虑这个管道:

Why do we do this? Well, consider this pipeline:

set.stream()
   .sorted()
   .forEach(System.out::println);

由于 forEach 不限于在顺序,排序列表的工作完全是浪费精力。所以我们反向传播这些信息(直到我们遇到短路操作,例如 limit ),以免失去这个优化机会。同样,我们可以在无序流上使用 distinct 的优化实现。

Since forEach is not constrained to operate in order, the work of sorting the list is completely wasted effort. So we back-propagate this information (until we hit a short-circuiting operation, such as limit), so as not to lose this optimization opportunity. Similarly, we can use an optimized implementation of distinct on unordered streams.


这种行为是打算还是错误?

Is this behavior intended or it's a bug?

是:)反向传播是有意的,因为它是一种有用的优化,不应产生不正确的结果。但是,bug部分是我们传播过去的 skip ,我们不应该这样做。所以UNORDERED标志的反向传播是过于激进的,这是一个错误。我们将发布一个错误。

Yes :) The back-propagation is intended, as it is a useful optimization that should not produce incorrect results. However, the bug part is that we are propagating past a previous skip, which we shouldn't. So the back-propagation of the UNORDERED flag is overly aggressive, and that's a bug. We'll post a bug.


如果是,它会在某处记录吗?

If yes is it documented somewhere?

它应该只是一个实现细节;如果它被正确实现,你将不会注意到(除了你的流更快。)

It should be just an implementation detail; if it were correctly implemented, you wouldn't notice (except that your streams are faster.)

这篇关于使用无序终端操作的Stream.skip行为的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆