为什么stream.spliterator()的tryAdvance可能会将项目累积到缓冲区中? [英] Why the tryAdvance of stream.spliterator() may accumulate items into a buffer?

查看:153
本文介绍了为什么stream.spliterator()的tryAdvance可能会将项目累积到缓冲区中?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Stream 管道获取 Spliterator 可能会返回 StreamSpliterators.WrappingSpliterator 。例如,获取以下 Spliterator

Getting a Spliterator from a Stream pipeline may return an instance of a StreamSpliterators.WrappingSpliterator. For example, getting the following Spliterator:

Spliterator<String> source = new Random()
            .ints(11, 0, 7) // size, origin, bound
            .filter(nr -> nr % 2 != 0)
            .mapToObj(Integer::toString)
            .spliterator();

鉴于以上 Spliterator< String>来源,当我们通过 tryAdvance(消费者<?超级P_OUT>消费者) Spliterator <的方法单独遍历元素时/ code>,在本例中是 StreamSpliterators.WrappingSpliterator ,它会在消耗这些项目之前首先将项目累积到内部缓冲区中,正如我们在 StreamSpliterators的.java#298 。从简单的角度来看, doAdvance()首先将项目插入缓冲区,然后它获取下一个项目,将它传递给 consumer.accept(...)

Given the above Spliterator<String> source, when we traverse the elements individually through the tryAdvance (Consumer<? super P_OUT> consumer) method of Spliterator, which in this case is an instance of StreamSpliterators.WrappingSpliterator, it will first accumulate items into an internal buffer, before consuming those items, as we can see in StreamSpliterators.java#298. From a simple point of view, the doAdvance() inserts items first into buffer and then it gets the next item and pass it to consumer.accept (…).

public boolean tryAdvance(Consumer<? super P_OUT> consumer) {
    boolean hasNext = doAdvance();
    if (hasNext)
        consumer.accept(buffer.get(nextToConsume));
    return hasNext;
}

但是,我不知道需要这个缓冲

However, I am not figuring out the need of this buffer.

在这种情况下,为什么 tryAdvance的消费者参数不仅仅用作终端 Sink

In this case, why the consumer parameter of the tryAdvance is not simply used as a terminal Sink of the pipeline?

推荐答案

我大多同意很棒的@Holger答案,但我会以不同的方式加入口音。我认为你很难理解缓冲区的必要性,因为你有一个非常简单的Stream API允许的心智模型。如果将Stream视为 map 过滤器的序列,则不需要额外的缓冲区,因为这些操作具有2个重要的好属性:

I mostly agree with great @Holger answer, but I would put accents differently. I think it is hard for you to understand the need for a buffer because you have very simplistic mental model of what Stream API allows. If one thinks about Stream as a sequence of map and filter, there is no need for additional buffer because those operations have 2 important "good" properties:


  1. 一次处理一个元素

  2. 产生0或结果是1个元素

然而,在一般情况下,这些都不是真的。正如@Holger(以及我在我的原始答案中所提到的)已经有 flatMap 在Java 8中打破了规则#2,在Java 9中他们最终添加了 takeWhile 实际转换整个 Stream - > Stream 而不是基于每个元素(这是AFAIK第一个中间衬衫循环操作)。

However those are not true in general case. As @Holger (and I in my original answer) mentioned there is already flatMap in Java 8 that breaks rule #2 and in Java 9 they've finally added takeWhile that actually transforms on whole Stream -> Stream rather than on a per-element basis (and that is AFAIK the first intermediate shirt-circuiting operation).

另一点我不太同意@Holger是因为我认为最根本的原因与他在第二段(即a)中提出的那个有点不同,你可以打电话给 tryAdvance 多次发布 Stream 的结尾,并且b)没有保证调用者将始终传递相同的消费者 )。我认为最重要的原因是 Spliterator 在功能上与 Stream 相同,必须支持短路和懒惰(即不处理整个 Stream 的能力,否则它不能支持未绑定的流)。换句话说,即使Spliterator API(非常奇怪)要求您必须对给定 Spliterator的所有方法的所有调用使用相同的 Consumer 对象,你仍然需要 tryAdvance 而那个 tryAdvance 实现仍然需要使用一些缓冲区。如果您只有 forEachRemaining(Consumer<?super T>),那么您无法停止处理数据,因此您无法实现类似于<$ c $的任何内容c> findFirst takeWhile 使用它。实际上,这是JDK实现内部使用 Sink 接口而不是消费者(以及换行的原因之一) wrapAndCopyInto 代表): Sink 有额外的 boolean cancellationRequested()方法。

Another point I don't quite agree with @Holger is that I think that the most fundamental reason is a bit different than the one he puts in the second paragraph (i.e. a) that you may call tryAdvance post the end of the Stream many times and b) that "there is no guaranty that the caller will always pass the same consumer"). I think that the most important reason is that Spliterator being functionally identical to Stream has to support short-circuiting and laziness (i.e. ability to not process the whole Stream or else it can't support unbound streams). In other words, even if Spliterator API (quite strangely) required that you must use the same Consumer object for all calls of all methods for a given Spliterator, you would still need tryAdvance and that tryAdvance implementation would still have to use some buffer. You just can't stop processing data if all you've got is forEachRemaining(Consumer<? super T> ) so you can't implement anything similar to findFirst or takeWhile using it. Actually this is one of the reasons why inside JDK implementation uses Sink interface rather than Consumer (and what "wrap" in wrapAndCopyInto stands for): Sink has additional boolean cancellationRequested() method.

所以 总结 :需要一个缓冲区,因为我们想要 Spliterator


  1. 使用提供的简单消费者无法报告处理/取消的结束

  2. 提供通过(逻辑)消费者的请求停止处理数据的方法。

  1. To use simple Consumer that provides no means to report back end of processing/cancellation
  2. To provide means to stop processing of the data by a request of the (logical) consumer.

请注意,这两个实际上是相互矛盾的要求。

Note that those two are actually slightly contradictory requirements.

示例和一些代码

这里我想提供一些代码示例,我认为如果没有额外的缓冲区给出当前API契约(接口),就无法实现这些代码。此示例基于您的示例。

Here I'd like to provide some example of code that I believe is impossible to implement without additional buffer given current API contract (interfaces). This example is based on your example.

有简单的 Collat​​z序列推测总是最终命中的整数1.AFAIK这个猜想尚未证实,但已经验证了许多整数(至少对于整个32位整数范围)。

There is simple Collatz sequence of integers that is conjectured to always eventually hit 1. AFAIK this conjecture is not proved yet but is verified for many integers (at least for whole 32-bit int range).

因此,假设我们要解决的问题如下:从Collat​​z序列流中获取1到1,000,000范围内的随机起始编号,找到第一个包含 123的十进制表示。

So assume that the problem we are trying to solve is following: from a stream of Collatz sequences for random start numbers in range from 1 to 1,000,000 find the first that contains "123" in its decimal representation.

这是一个仅使用 Stream 的解决方案(不是 Spliterator ):

Here is a solution that uses just Stream (not a Spliterator):

static String findGoodNumber() {
    return new Random()
            .ints(1, 1_000_000) // unbound!
            .flatMap(nr -> collatzSequence(nr))
            .mapToObj(Integer::toString)
            .filter(s -> s.contains("123"))
            .findFirst().get();
}

其中 collat​​zSequence 是返回 Stream 的函数,包含Collat​​z序列,直到第1个(并且对于nitpickers,当当前值大于 Integer.MAX_VALUE /时,它也会停止3 所以我们不会遇到溢出)。

where collatzSequence is a function that returns Stream containing the Collatz sequence until the first 1 (and for nitpickers let it also stop when current value is bigger than Integer.MAX_VALUE /3 so we don't hit overflow).

collat​​zSequence 返回的每个都受约束。标准的随机最终将生成所提供范围内的每个数字。这意味着我们保证流中最终会有一些好的数字(例如 123 )和 findFirst 正在短路,因此整个操作实际上将终止。但是,没有合理的Stream API实现可以预测这一点。

Every such Stream returned by collatzSequence is bound. Also standard Random will eventually generate every number in the provided range. It means that we are guaranteed that there eventually will be some "good" number in the stream (for example just 123) and findFirst is short-circuiting so the whole operation will actually terminate. However no reasonable Stream API implementation can predict this.

现在我们假设您出于某种奇怪的原因想要使用中间 Spliterator 执行相同的操作。即使你只有一个逻辑而且不需要不同的 Consumer ,你也不能使用 forEachRemaining 。所以你必须做这样的事情:

Now let's assume that for some strange reason you want to perform the same thing using intermediate Spliterator. Even though you have only one piece of logic and no need for different Consumers, you can't use forEachRemaining. So you'll have to do something like this:

static Spliterator<String> createCollatzRandomSpliterator() {
    return new Random()
            .ints(1, 1_000_000) // unbound!
            .flatMap(nr -> collatzSequence(nr))
            .mapToObj(Integer::toString)
            .spliterator();
}

static String findGoodNumberWithSpliterator() {
    Spliterator<String> source = createCollatzRandomSpliterator();

    String[] res = new String[1]; // work around for "final" closure restriction

    while (source.tryAdvance(s -> {
        if (s.contains("123")) {
            res[0] = s;
        }
    })) {
        if (res[0] != null)
            return res[0];
    }
    throw new IllegalStateException("Impossible");
}

对于某些起始数字,Collat​​z序列将包含多个匹配也很重要数字。例如, 41123 123370 (= 41123 * 3 + 1)都包含123。这意味着我们真的不希望在第一次匹配匹配后调用我们的 Consumer 。但由于消费者没有公开报告处理结束的任何方法, WrappingSpliterator 不能通过我们的消费者到内部 Spliterator 。唯一的解决方案是将内部 flatMap 的所有结果(包含所有后处理)累积到某个缓冲区中,然后一次遍历该缓冲区一个元素。

It is also important that for some starting numbers the Collatz sequence will contain several matching numbers. For example, both 41123 and 123370 (= 41123*3+1) contain "123". It means that we really don't want our Consumer to be called post the first matching hit. But since Consumer doesn't expose any means to report end of processing, WrappingSpliterator can't just pass our Consumer to the inner Spliterator. The only solution is to accumulate all results of inner flatMap (with all the post-processing) into some buffer and then iterate over that buffer one element at a time.

这篇关于为什么stream.spliterator()的tryAdvance可能会将项目累积到缓冲区中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆