为什么stream.spliterator()的tryAdvance可能会将项目累积到缓冲区中? [英] Why the tryAdvance of stream.spliterator() may accumulate items into a buffer?
问题描述
从 Stream
管道获取 Spliterator
可能会返回 StreamSpliterators.WrappingSpliterator 。例如,获取以下 Spliterator
:
Getting a Spliterator
from a Stream
pipeline may return an instance of a StreamSpliterators.WrappingSpliterator. For example, getting the following Spliterator
:
Spliterator<String> source = new Random()
.ints(11, 0, 7) // size, origin, bound
.filter(nr -> nr % 2 != 0)
.mapToObj(Integer::toString)
.spliterator();
鉴于以上 Spliterator< String>来源
,当我们通过 tryAdvance(消费者<?超级P_OUT>消费者)
Spliterator <的方法单独遍历元素时/ code>,在本例中是 StreamSpliterators.WrappingSpliterator ,它会在消耗这些项目之前首先将项目累积到内部缓冲区中,正如我们在 StreamSpliterators的.java#298 。从简单的角度来看,
doAdvance()
首先将项目插入缓冲区
,然后它获取下一个项目,将它传递给 consumer.accept(...)
。
Given the above Spliterator<String> source
, when we traverse the elements individually through the tryAdvance (Consumer<? super P_OUT> consumer)
method of Spliterator
, which in this case is an instance of StreamSpliterators.WrappingSpliterator, it will first accumulate items into an internal buffer, before consuming those items, as we can see in StreamSpliterators.java#298. From a simple point of view, the doAdvance()
inserts items first into buffer
and then it gets the next item and pass it to consumer.accept (…)
.
public boolean tryAdvance(Consumer<? super P_OUT> consumer) {
boolean hasNext = doAdvance();
if (hasNext)
consumer.accept(buffer.get(nextToConsume));
return hasNext;
}
但是,我不知道需要这个缓冲
。
However, I am not figuring out the need of this buffer
.
在这种情况下,为什么 tryAdvance的
不仅仅用作终端 消费者
参数 Sink
?
In this case, why the consumer
parameter of the tryAdvance
is not simply used as a terminal Sink
of the pipeline?
推荐答案
我大多同意很棒的@Holger答案,但我会以不同的方式加入口音。我认为你很难理解缓冲区的必要性,因为你有一个非常简单的Stream API允许的心智模型。如果将Stream视为 map
和过滤器
的序列,则不需要额外的缓冲区,因为这些操作具有2个重要的好属性:
I mostly agree with great @Holger answer, but I would put accents differently. I think it is hard for you to understand the need for a buffer because you have very simplistic mental model of what Stream API allows. If one thinks about Stream as a sequence of map
and filter
, there is no need for additional buffer because those operations have 2 important "good" properties:
- 一次处理一个元素
- 产生0或结果是1个元素
然而,在一般情况下,这些都不是真的。正如@Holger(以及我在我的原始答案中所提到的)已经有 flatMap $ c $ 8>在Java 8中打破了规则#2,在Java 9中他们最终添加了 takeWhile 实际转换整个
Stream
- > Stream
而不是基于每个元素(这是AFAIK第一个中间衬衫循环操作)。
However those are not true in general case. As @Holger (and I in my original answer) mentioned there is already flatMap
in Java 8 that breaks rule #2 and in Java 9 they've finally added takeWhile that actually transforms on whole Stream
-> Stream
rather than on a per-element basis (and that is AFAIK the first intermediate shirt-circuiting operation).
另一点我不太同意@Holger是因为我认为最根本的原因与他在第二段(即a)中提出的那个有点不同,你可以打电话给 tryAdvance
多次发布 Stream
的结尾,并且b)没有保证调用者将始终传递相同的消费者 )。我认为最重要的原因是 Spliterator
在功能上与 Stream
相同,必须支持短路和懒惰(即不处理整个 Stream
的能力,否则它不能支持未绑定的流)。换句话说,即使Spliterator API(非常奇怪)要求您必须对给定 Spliterator的所有方法的所有调用使用相同的
,你仍然需要 Consumer
对象 tryAdvance
而那个 tryAdvance
实现仍然需要使用一些缓冲区。如果您只有 forEachRemaining(Consumer<?super T>)
,那么您无法停止处理数据,因此您无法实现类似于<$ c $的任何内容c> findFirst 或 takeWhile
使用它。实际上,这是JDK实现内部使用 Sink
接口而不是消费者
(以及换行的原因之一) wrapAndCopyInto
代表): Sink
有额外的 boolean cancellationRequested()
方法。
Another point I don't quite agree with @Holger is that I think that the most fundamental reason is a bit different than the one he puts in the second paragraph (i.e. a) that you may call tryAdvance
post the end of the Stream
many times and b) that "there is no guaranty that the caller will always pass the same consumer"). I think that the most important reason is that Spliterator
being functionally identical to Stream
has to support short-circuiting and laziness (i.e. ability to not process the whole Stream
or else it can't support unbound streams). In other words, even if Spliterator API (quite strangely) required that you must use the same Consumer
object for all calls of all methods for a given Spliterator
, you would still need tryAdvance
and that tryAdvance
implementation would still have to use some buffer. You just can't stop processing data if all you've got is forEachRemaining(Consumer<? super T> )
so you can't implement anything similar to findFirst
or takeWhile
using it. Actually this is one of the reasons why inside JDK implementation uses Sink
interface rather than Consumer
(and what "wrap" in wrapAndCopyInto
stands for): Sink
has additional boolean cancellationRequested()
method.
所以 总结 :需要一个缓冲区,因为我们想要 Spliterator
:
- 使用提供的简单
消费者
无法报告处理/取消的结束 - 提供通过(逻辑)消费者的请求停止处理数据的方法。
- To use simple
Consumer
that provides no means to report back end of processing/cancellation - To provide means to stop processing of the data by a request of the (logical) consumer.
请注意,这两个实际上是相互矛盾的要求。
Note that those two are actually slightly contradictory requirements.
示例和一些代码
这里我想提供一些代码示例,我认为如果没有额外的缓冲区给出当前API契约(接口),就无法实现这些代码。此示例基于您的示例。
Here I'd like to provide some example of code that I believe is impossible to implement without additional buffer given current API contract (interfaces). This example is based on your example.
有简单的 Collatz序列推测总是最终命中的整数1.AFAIK这个猜想尚未证实,但已经验证了许多整数(至少对于整个32位整数范围)。
There is simple Collatz sequence of integers that is conjectured to always eventually hit 1. AFAIK this conjecture is not proved yet but is verified for many integers (at least for whole 32-bit int range).
因此,假设我们要解决的问题如下:从Collatz序列流中获取1到1,000,000范围内的随机起始编号,找到第一个包含 123的十进制表示。
So assume that the problem we are trying to solve is following: from a stream of Collatz sequences for random start numbers in range from 1 to 1,000,000 find the first that contains "123" in its decimal representation.
这是一个仅使用 Stream
的解决方案(不是 Spliterator
):
Here is a solution that uses just Stream
(not a Spliterator
):
static String findGoodNumber() {
return new Random()
.ints(1, 1_000_000) // unbound!
.flatMap(nr -> collatzSequence(nr))
.mapToObj(Integer::toString)
.filter(s -> s.contains("123"))
.findFirst().get();
}
其中 collatzSequence
是返回 Stream
的函数,包含Collatz序列,直到第1个(并且对于nitpickers,当当前值大于 Integer.MAX_VALUE /时,它也会停止3
所以我们不会遇到溢出)。
where collatzSequence
is a function that returns Stream
containing the Collatz sequence until the first 1 (and for nitpickers let it also stop when current value is bigger than Integer.MAX_VALUE /3
so we don't hit overflow).
collatzSequence
返回的每个流
都受约束。标准的随机
最终将生成所提供范围内的每个数字。这意味着我们保证流中最终会有一些好的数字(例如 123
)和 findFirst
正在短路,因此整个操作实际上将终止。但是,没有合理的Stream API实现可以预测这一点。
Every such Stream
returned by collatzSequence
is bound. Also standard Random
will eventually generate every number in the provided range. It means that we are guaranteed that there eventually will be some "good" number in the stream (for example just 123
) and findFirst
is short-circuiting so the whole operation will actually terminate. However no reasonable Stream API implementation can predict this.
现在我们假设您出于某种奇怪的原因想要使用中间 Spliterator
执行相同的操作。即使你只有一个逻辑而且不需要不同的 Consumer
,你也不能使用 forEachRemaining
。所以你必须做这样的事情:
Now let's assume that for some strange reason you want to perform the same thing using intermediate Spliterator
. Even though you have only one piece of logic and no need for different Consumer
s, you can't use forEachRemaining
. So you'll have to do something like this:
static Spliterator<String> createCollatzRandomSpliterator() {
return new Random()
.ints(1, 1_000_000) // unbound!
.flatMap(nr -> collatzSequence(nr))
.mapToObj(Integer::toString)
.spliterator();
}
static String findGoodNumberWithSpliterator() {
Spliterator<String> source = createCollatzRandomSpliterator();
String[] res = new String[1]; // work around for "final" closure restriction
while (source.tryAdvance(s -> {
if (s.contains("123")) {
res[0] = s;
}
})) {
if (res[0] != null)
return res[0];
}
throw new IllegalStateException("Impossible");
}
对于某些起始数字,Collatz序列将包含多个匹配也很重要数字。例如, 41123
和 123370
(= 41123 * 3 + 1)都包含123。这意味着我们真的不希望在第一次匹配匹配后调用我们的 Consumer
。但由于消费者
没有公开报告处理结束的任何方法, WrappingSpliterator
不能通过我们的消费者
到内部 Spliterator
。唯一的解决方案是将内部 flatMap
的所有结果(包含所有后处理)累积到某个缓冲区中,然后一次遍历该缓冲区一个元素。
It is also important that for some starting numbers the Collatz sequence will contain several matching numbers. For example, both 41123
and 123370
(= 41123*3+1) contain "123". It means that we really don't want our Consumer
to be called post the first matching hit. But since Consumer
doesn't expose any means to report end of processing, WrappingSpliterator
can't just pass our Consumer
to the inner Spliterator
. The only solution is to accumulate all results of inner flatMap
(with all the post-processing) into some buffer and then iterate over that buffer one element at a time.
这篇关于为什么stream.spliterator()的tryAdvance可能会将项目累积到缓冲区中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!