为什么在 flatMap() 之后的 filter() 是“不完全"的?在 Java 流中懒惰? [英] Why filter() after flatMap() is "not completely" lazy in Java streams?

查看:17
本文介绍了为什么在 flatMap() 之后的 filter() 是“不完全"的?在 Java 流中懒惰?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下示例代码:

System.out.println(
       "Result: " +
        Stream.of(1, 2, 3)
                .filter(i -> {
                    System.out.println(i);
                    return true;
                })
                .findFirst()
                .get()
);
System.out.println("-----------");
System.out.println(
       "Result: " +
        Stream.of(1, 2, 3)
                .flatMap(i -> Stream.of(i - 1, i, i + 1))
                .flatMap(i -> Stream.of(i - 1, i, i + 1))
                .filter(i -> {
                    System.out.println(i);
                    return true;
                })
                .findFirst()
                .get()
);

输出如下:

1
Result: 1
-----------
-1
0
1
0
1
2
1
2
3
Result: -1

从这里我看到,在第一种情况下 stream 确实表现得很懒惰 - 我们使用 findFirst() 所以一旦我们有了第一个元素,我们的过滤 lambda 就不会被调用.然而,在使用 flatMaps 的第二种情况下,我们看到尽管找到了满足过滤条件的第一个元素(它只是任何第一个元素,因为 lambda 总是返回 true)流的其他内容仍在被提供通过过滤功能.

From here I see that in first case stream really behaves lazily - we use findFirst() so once we have first element our filtering lambda is not invoked. However, in second case which uses flatMaps we see that despite first element which fulfils the filter condition is found (it's just any first element as lambda always returns true) further contents of the stream are still being fed through filtering function.

我试图理解为什么它的行为是这样的,而不是像第一种情况那样在计算第一个元素后放弃.任何有用的信息将不胜感激.

I am trying to understand why it behaves like this rather than giving up after first element is calculated as in the first case. Any helpful information would be appreciated.

推荐答案

TL;DR,这已在 JDK-8075939 并在 Java 10 中修复(并在 JDK-8225328).

TL;DR, this has been addressed in JDK-8075939 and fixed in Java 10 (and backported to Java 8 in JDK-8225328).

在查看实现 (ReferencePipeline.java) 时,我们看到方法 [链接]

When looking into the implementation (ReferencePipeline.java) we see the method [link]

@Override
final void forEachWithCancel(Spliterator<P_OUT> spliterator, Sink<P_OUT> sink) {
    do { } while (!sink.cancellationRequested() && spliterator.tryAdvance(sink));
}

将调用 findFirst 操作.需要特别注意的是 sink.cancellationRequested() 允许在第一次匹配时结束循环.比较 [链接]

which will be invoke for findFirst operation. The special thing to take care about is the sink.cancellationRequested() which allows to end the loop on the first match. Compare to [link]

@Override
public final <R> Stream<R> flatMap(Function<? super P_OUT, ? extends Stream<? extends R>> mapper) {
    Objects.requireNonNull(mapper);
    // We can do better than this, by polling cancellationRequested when stream is infinite
    return new StatelessOp<P_OUT, R>(this, StreamShape.REFERENCE,
                                 StreamOpFlag.NOT_SORTED | StreamOpFlag.NOT_DISTINCT | StreamOpFlag.NOT_SIZED) {
        @Override
        Sink<P_OUT> opWrapSink(int flags, Sink<R> sink) {
            return new Sink.ChainedReference<P_OUT, R>(sink) {
                @Override
                public void begin(long size) {
                    downstream.begin(-1);
                }

                @Override
                public void accept(P_OUT u) {
                    try (Stream<? extends R> result = mapper.apply(u)) {
                        // We can do better that this too; optimize for depth=0 case and just grab spliterator and forEach it
                        if (result != null)
                            result.sequential().forEach(downstream);
                    }
                }
            };
        }
    };
}

推进一项的方法最终会在子流上调用 forEach 没有任何提前终止的可能性,并且 flatMap 方法开头的注释甚至告诉关于这个缺失的功能.

The method for advancing one item ends up calling forEach on the sub-stream without any possibility for earlier termination and the comment at the beginning of the flatMap method even tells about this absent feature.

因为这不仅仅是一个优化的事情,因为它意味着当子流无限时代码会简单地中断,我希望开发人员很快证明他们可以做得更好"......

Since this is more than just an optimization thing as it implies that the code simply breaks when the sub-stream is infinite, I hope that the developers soon prove that they "can do better than this"…

为了说明含义,虽然 Stream.iterate(0, i->i+1).findFirst() 按预期工作,Stream.of("").flatMap(x->Stream.iterate(0, i->i+1)).findFirst() 将进入无限循环.

To illustrate the implications, while Stream.iterate(0, i->i+1).findFirst() works as expected, Stream.of("").flatMap(x->Stream.iterate(0, i->i+1)).findFirst() will end up in an infinite loop.

关于规范,大部分都可以在

Regarding the specification, most of it can be found in the

流操作"一章和管道"的包规范:

中间操作返回一个新的流.他们总是懒惰;

Intermediate operations return a new stream. They are always lazy;

... 懒惰还可以避免在不必要时检查所有数据;对于诸如查找第一个长度超过 1000 个字符的字符串"之类的操作,只需检查刚好足够的字符串即可找到具有所需特征的字符串,而无需检查源中所有可用的字符串.(当输入流是无限的而不仅仅是大时,这种行为变得更加重要.)

… Laziness also allows avoiding examining all the data when it is not necessary; for operations such as "find the first string longer than 1000 characters", it is only necessary to examine just enough strings to find one that has the desired characteristics without examining all of the strings available from the source. (This behavior becomes even more important when the input stream is infinite and not merely large.)

此外,某些操作被视为短路操作.如果中间操作在呈现无限输入时可能因此产生有限流,则它是短路的.如果在出现无限输入时,终端操作可能会在有限时间内终止,则该终端操作是短路的.管道中的短路操作是无限流处理在有限时间内正常终止的必要条件,但不是充分条件.

Further, some operations are deemed short-circuiting operations. An intermediate operation is short-circuiting if, when presented with infinite input, it may produce a finite stream as a result. A terminal operation is short-circuiting if, when presented with infinite input, it may terminate in finite time. Having a short-circuiting operation in the pipeline is a necessary, but not sufficient, condition for the processing of an infinite stream to terminate normally in finite time.

很明显,短路操作并不能保证有限时间终止,例如当过滤器不匹配任何项目时,处理无法完成,但不支持通过简单地忽略操作的短路性质在有限时间内终止的实现与规范相去甚远.

It’s clear that a short-circuiting operation doesn’t guaranty a finite time termination, e.g. when a filter doesn’t match any item the processing can’t complete, but an implementation which doesn’t support any termination in finite time by simply ignoring the short-circuiting nature of an operation is far off the specification.

这篇关于为什么在 flatMap() 之后的 filter() 是“不完全"的?在 Java 流中懒惰?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆