具有无序终端操作的 Stream.skip 行为 [英] Stream.skip behavior with unordered terminal operation

查看:16
本文介绍了具有无序终端操作的 Stream.skip 行为的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经阅读了这篇这个 问题,但仍然怀疑 Stream.skip 观察到的行为是否是 JDK 作者的意图.

I've already read this and this questions, but still doubt whether the observed behavior of Stream.skip was intended by JDK authors.

让我们简单输入数字 1..20:

Let's have simple input of numbers 1..20:

List<Integer> input = IntStream.rangeClosed(1, 20).boxed().collect(Collectors.toList());

现在让我们创建一个并行流,以不同的方式将unordered()skip()结合起来并收集结果:

Now let's create a parallel stream, combine the unordered() with skip() in different ways and collect the result:

System.out.println("skip-skip-unordered-toList: "
        + input.parallelStream().filter(x -> x > 0)
            .skip(1)
            .skip(1)
            .unordered()
            .collect(Collectors.toList()));
System.out.println("skip-unordered-skip-toList: "
        + input.parallelStream().filter(x -> x > 0)
            .skip(1)
            .unordered()
            .skip(1)
            .collect(Collectors.toList()));
System.out.println("unordered-skip-skip-toList: "
        + input.parallelStream().filter(x -> x > 0)
            .unordered()
            .skip(1)
            .skip(1)
            .collect(Collectors.toList()));

过滤步骤在这里基本上什么都不做,但给流引擎增加了更多的困难:现在它不知道输出的确切大小,因此关闭了一些优化.我有以下结果:

Filtering step does essentially nothing here, but adds more difficulty for stream engine: now it does not know the exact size of the output, thus some optimizations are turned off. I have the following results:

skip-skip-unordered-toList: [3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]
// absent values: 1, 2
skip-unordered-skip-toList: [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 16, 17, 18, 19, 20]
// absent values: 1, 15
unordered-skip-skip-toList: [1, 2, 3, 4, 5, 6, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 19, 20]
// absent values: 7, 18

结果完全没问题,一切都按预期进行.在第一种情况下,我要求跳过前两个元素,然后按无特定顺序收集到列表.在第二种情况下,我要求跳过第一个元素,然后变成无序并再跳过一个元素(我不在乎是哪个).第三种情况我先转为无序模式,然后跳过两个任意元素.

The results are completely fine, everything works as expected. In the first case I asked to skip first two elements, then collect to list in no particular order. In the second case I asked to skip the first element, then turn into unordered and skip one more element (I don't care which one). In the third case I turned into unordered mode first, then skip two arbitrary elements.

让我们跳过一个元素并以无序模式收集到自定义集合.我们的自定义集合将是一个 HashSet:

Let's skip one element and collect to the custom collection in unordered mode. Our custom collection will be a HashSet:

System.out.println("skip-toCollection: "
        + input.parallelStream().filter(x -> x > 0)
        .skip(1)
        .unordered()
        .collect(Collectors.toCollection(HashSet::new)));

输出令人满意:

skip-toCollection: [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]
// 1 is skipped

所以总的来说,我希望只要流是有序的,skip() 就会跳过第一个元素,否则它会跳过任意​​的元素.

So in general I expect that as long as stream is ordered, skip() skips the first elements, otherwise it skips arbitrary ones.

然而,让我们使用等效的无序终端操作 collect(Collectors.toSet()):

However let's use an equivalent unordered terminal operation collect(Collectors.toSet()):

System.out.println("skip-toSet: "
        + input.parallelStream().filter(x -> x > 0)
            .skip(1)
            .unordered()
            .collect(Collectors.toSet()));

现在输出是:

skip-toSet: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 14, 15, 16, 17, 18, 19, 20]
// 13 is skipped

使用任何其他无序终端操作(如 forEachfindAnyanyMatch 等)也可以获得相同的结果.在这种情况下删除 unordered() 步骤没有任何改变.似乎虽然 unordered() 步骤正确地使流从当前操作开始无序,但无序终端操作使整个流从一开始就无序,尽管如果 skip 这会影响结果() 被使用.这对我来说似乎完全是一种误导:我希望使用无序收集器与将流转换为无序模式就在终端操作之前和使用等效的有序收集器一样.

The same result can be achieved with any other unordered terminal operation (like forEach, findAny, anyMatch, etc.). Removing unordered() step in this case changes nothing. Seems that while unordered() step correctly makes the stream unordered starting from the current operation, the unordered terminal operation makes the whole stream unordered starting from very beginning despite that this can affect the result if skip() was used. This seems completely misleading for me: I expect that using the unordered collector is the same as turning the stream into unordered mode just before the terminal operation and using the equivalent ordered collector.

所以我的问题是:

  1. 这种行为是有意为之还是错误?
  2. 如果是,是否在某处记录?我已经阅读了 Stream.skip() 文档:它没有说明无序终端操作.还有 Characteristics.UNORDERED 文档不是很容易理解,也没有说整个流的排序都会丢失.最后,订购包摘要中的部分也不包括这种情况.可能我遗漏了什么?
  3. 如果无序终端操作旨在使整个流无序,为什么 unordered() 步骤仅从此时起才使其无序?我可以依赖这种行为吗?或者我只是幸运,我的第一次测试运行良好?
  1. Is this behavior intended or it's a bug?
  2. If yes is it documented somewhere? I've read Stream.skip() documentation: it does not say anything about unordered terminal operations. Also Characteristics.UNORDERED documentation is not very comprehend and does not say that ordering will be lost for the whole stream. Finally, Ordering section in package summary does not cover this case either. Probably I'm missing something?
  3. If it's intended that unordered terminal operation makes the whole stream unordered, why unordered() step makes it unordered only since this point? Can I rely on this behavior? Or I was just lucky that my first tests work nicely?

推荐答案

回想一下,流标志(ORDERED、SORTED、SIZED、DISTINCT)的目标是启用操作以避免做不必要的工作.涉及流标志的优化示例包括:

Recall that the goal of stream flags (ORDERED, SORTED, SIZED, DISTINCT) is to enable operations to avoid doing unnecessary work. Examples of optimizations that involve stream flags are:

  • 如果我们知道流已经排序,那么 sorted() 是一个空操作;
  • 如果我们知道流的大小,我们可以在toArray()中预先分配一个正确大小的数组,避免复制;
  • 如果我们知道输入没有有意义的遭遇顺序,我们就不需要采取额外的步骤来保留遭遇顺序.
  • If we know the stream is already sorted, then sorted() is a no-op;
  • If we know the size of the stream, we can pre-allocate a correct-sized array in toArray(), avoiding a copy;
  • If we know that the input has no meaningful encounter order, we need not take extra steps to preserve encounter order.

管道的每个阶段都有一组流标志.中间操作可以注入、保留或清除流标志.例如,过滤保留排序性/独特性但不保留大小性;映射保留大小性但不保留排序性或独特性.排序注入排序性.中间操作的标志处理相当简单,因为所有决策都是本地的.

Each stage of a pipeline has a set of stream flags. Intermediate operations can inject, preserve, or clear stream flags. For example, filtering preserves sorted-ness / distinct-ness but not sized-ness; mapping preserves sized-ness but not sorted-ness or distinct-ness. Sorting injects sorted-ness. The treatment of flags for intermediate operations is fairly straightforward, because all decisions are local.

对终端操作标志的处理更加微妙.ORDERED 是终端操作最相关的标志.如果终端操作是无序的,那么我们会反向传播无序.

The treatment of flags for terminal operations is more subtle. ORDERED is the most relevant flag for terminal ops. And if a terminal op is UNORDERED, then we do back-propagate the unordered-ness.

我们为什么要这样做?好吧,考虑一下这个管道:

Why do we do this? Well, consider this pipeline:

set.stream()
   .sorted()
   .forEach(System.out::println);

由于 forEach 没有被限制为按顺序操作,所以对列表进行排序的工作完全是白费力气.所以我们反向传播这个信息(直到我们遇到短路操作,例如limit),以免失去这个优化机会.类似地,我们可以在无序流上使用 distinct 的优化实现.

Since forEach is not constrained to operate in order, the work of sorting the list is completely wasted effort. So we back-propagate this information (until we hit a short-circuiting operation, such as limit), so as not to lose this optimization opportunity. Similarly, we can use an optimized implementation of distinct on unordered streams.

这种行为是有意为之还是错误?

Is this behavior intended or it's a bug?

是 :) 反向传播是有意的,因为它是一种有用的优化,不会产生错误的结果.然而,错误的部分是我们正在传播过去的skip,这是我们不应该的.因此 UNORDERED 标志的反向传播过于激进,这是一个错误.我们会发布一个错误.

Yes :) The back-propagation is intended, as it is a useful optimization that should not produce incorrect results. However, the bug part is that we are propagating past a previous skip, which we shouldn't. So the back-propagation of the UNORDERED flag is overly aggressive, and that's a bug. We'll post a bug.

如果是,是否记录在某处?

If yes is it documented somewhere?

应该只是一个实现细节;如果正确实施,您将不会注意到(除了您的流更快.)

It should be just an implementation detail; if it were correctly implemented, you wouldn't notice (except that your streams are faster.)

这篇关于具有无序终端操作的 Stream.skip 行为的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆