在多线程情况下使用限制流的最佳方式性能 [英] Best way performance wise to use limit on stream in case of multithreading

查看:119
本文介绍了在多线程情况下使用限制流的最佳方式性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我观看了JoséPaumard在InfoQ上的演讲: http:// www.infoq.com/fr/presentations/jdk8-lambdas-streams-collectors (法语)

I watched a talk by José Paumard on InfoQ : http://www.infoq.com/fr/presentations/jdk8-lambdas-streams-collectors (French)

事情是我被困在这一点上。
要使用流 AND 多线程收集1M Long,我们可以这样做:

The thing is I got stuck on this one point. To collect 1M Long using stream AND multithreading we can do it this way :

Stream<Long> stream = 
  Stream.generate(() -> ThreadLocalRandom.current().nextLong()) ;

List<Long> list1 = 
  stream.parallel().limit(10_000_000).collect(Collectors.toList()) ;

但考虑到线程总是检查上述限制会妨碍性能。

But given the fact that the threads are always checking the said limit in hinders performance.

在那次谈话中我们也看到了第二个解决方案:

In that talk we also see this second solution :

Stream<Long> stream = 
  ThreadLocalRandom.current().longs(10_000_000).mapToObj(Long::new) ;

List<Long> list = 
  stream.parallel().collect(Collectors.toList()) ;

并且它似乎表现更好。

所以这是我的问题:为什么第二个代码更好,是否有更好的,或者至少成本更低的方式呢?

So here is my question : Why is that the second code better, and is there a better, or at least less costly way to do it?

推荐答案

这是依赖于实现的限制。关注并行性能的开发人员必须理解的一点是,可预测的流大小通常可以帮助平行性能,因为它们允许平衡分配工作负载。

This is an implementation dependent limitation. One thing that developers, concerned about parallel performance, have to understand, is that predictable stream sizes help the parallel performance generally as they allow balanced splitting of the workload.

问题这里是通过 Stream.generate() limit()创建的无限流的组合虽然看起来完全可以预测我们,但是产生一个具有可预测大小的流。

The issue here is, that the combination of an infinite stream as created via Stream.generate() and limit() does not produce a stream with a predictable size, despite it looks perfectly predictable to us.

我们可以使用以下帮助方法检查它:

We can examine it using the following helper method:

static void sizeOf(String op, IntStream stream) {
    final Spliterator.OfInt s = stream.spliterator();
    System.out.printf("%-18s%5d, %d%n", op, s.getExactSizeIfKnown(), s.estimateSize());
}

然后

sizeOf("randoms with size", ThreadLocalRandom.current().ints(1000));
sizeOf("randoms with limit", ThreadLocalRandom.current().ints().limit(1000));
sizeOf("range", IntStream.range(0, 100));
sizeOf("range map", IntStream.range(0, 100).map(i->i));
sizeOf("range filter", IntStream.range(0, 100).filter(i->true));
sizeOf("range limit", IntStream.range(0, 100).limit(10));
sizeOf("generate limit", IntStream.generate(()->42).limit(10));

将打印

randoms with size  1000, 1000
randoms with limit   -1, 9223372036854775807
range               100, 100
range map           100, 100
range filter         -1, 100
range limit          -1, 100
generate limit       -1, 9223372036854775807

所以我们看到,某些来源如 Random.ints(size) IntStream.range(...)生成具有可预测大小的流,并且某些中间操作(如 map )能够携带信息,因为他们知道大小不受影响。其他像 filter limit 不会传播大小(已知的确切大小)。

So we see, certain sources like Random.ints(size) or IntStream.range(…) produce streams with a predictable size and certain intermediate operations like map are capable of carrying the information as they know that the size is not affected. Others like filter and limit do not propagate the size (as a known exact size).

很明显,过滤器无法预测元素的实际数量,但它提供了源大小作为估计值,这是合理的,因为它是可以通过过滤器的最大元素数。

It’s clear that filter cannot predict the actual number of elements, but it provides the source size as an estimate which is reasonable insofar that that’s the maximum number of elements that can ever pass the filter.

相比之下,当前的限制实现不提供大小,即使源具有确切的大小,我们知道可预测的大小就像 min(源大小,限制)一样简单。相反,它甚至报告了一个荒谬的估计大小(来源的大小),尽管事实上已知结果大小永远不会高于限制。在无限流的情况下,我们有另外的障碍 Spliterator 接口,流基于此接口,无法报告它是无限的。在这些情况下,无限流+限制返回 Long.MAX_VALUE 作为估计,这意味着我甚至无法猜测。

In contrast, the current limit implementation does not provide a size, even if the source has an exact size and we know the predictable size is as simple as min(source size, limit). Instead, it even reports a nonsensical estimate size (the source’s size) despite the fact that it is known that the resulting size will never be higher than the limit. In case of an infinite stream we have the additional obstacle that the Spliterator interface, on which streams are based, doesn’t have a way to report that it is infinite. In these cases, infinite stream + limit returns Long.MAX_VALUE as an estimate which means "I can’t even guess".

因此,根据经验,当前实现时,程序员应该避免使用 limit ,这样就可以预先在流中指定所需的大小。资源。但由于 limit 有序并行流的情况下也存在重大(记录)缺点(不适用于randoms或生成),大多数开发人员无论如何都要避免 limit

Thus, as a rule of thumb, with the current implementation, a programmer should avoid using limit when there is a way to specify the desired size beforehand at the stream’s source. But since limit also has significant (documented) drawbacks in the case of ordered parallel streams (which doesn’t applies to randoms nor generate), most developers avoid limit anyway.

这篇关于在多线程情况下使用限制流的最佳方式性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆