Java 8流中的一个简单的list.parallelStream()似乎无法进行窃取工作? [英] A simple list.parallelStream() in java 8 stream does not seem to do work stealing?

查看:81
本文介绍了Java 8流中的一个简单的list.parallelStream()似乎无法进行窃取工作?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

从这个问题开始, 内部并行流将在以下位置完全处理在考虑并行化外部流之前是并行的吗?,我了解到流执行了工作窃取.但是,我注意到它似乎通常不会发生.例如,如果我有一个表示100,000个元素的列表并尝试以parallelStream()方式处理它时,我经常注意到最后我的大多数CPU内核都处于等待"状态(请注意:在列表中的100,000个元素中,有些元素需要很长的时间)需要处理的时间,而其他线程则很快;并且,列表不平衡,这就是为什么某些线程可能会变得不幸"并有很多工作要做,而其他线程却很幸运而却无所事事的原因.

From this question " Will inner parallel streams be processed fully in parallel before considering parallelizing outer stream?", I understood that streams perform work-stealing. However, I've noticed that it often doesn't seem to occur. For example, if I have a List of say 100,000 elements and I attempt to process it in parallelStream() fashion, I often notice towards the end that most of my CPU cores are sitting idle in the "waiting" state. (Note: Of the 100,000 elements in the list, some elements take a long time to process, whereas others are fast; and, the list is not balanced, which is why some threads may get "unlucky" and have lots to do, whereas others get lucky and have little to do).

所以,我的理论是JIT编译器将100,000个元素初始划分为16个线程(因为我有16个内核),但是在每个线程中,它只是执行了一个简单的(顺序)for循环(如那将是最有效的),因此不会发生偷工作的情况(这就是我所看到的).

So, my theory is that JIT compiler does an initial division of the 100,000 elements into the 16 threads (because I have 16 cores), but then within each thread, it just does a simple (sequential) for-loop (as that would be the most efficient) and therefore no work stealing would ever occurr (which is what I'm seeing).

我认为的原因在考虑并行化外部流之前,内部并行流会被完全并行处理吗?在这种情况下,每个内部循环都会在运行时进行评估,并将创建可以在运行时分配给空闲"线程的新任务.有什么想法吗?我在做错什么事情,会强制"一个简单的list.parallelStream()使用工作窃取功能吗? (我目前的解决方法是尝试根据各种启发式方法来平衡列表,以使每个线程通常都能看到相同数量的工作;但是,很难预测到这一点.)

I think the reason why Will inner parallel streams be processed fully in parallel before considering parallelizing outer stream? showed work stealing is that there was an OUTER loop that was streaming and an INNER LOOP that was streaming, and so in that case, each inner loop got evaluated at run time and would create new tasks that could, at runtime, be assigned to "idle" threads. Thoughts? Is there something I'm doing wrong that would "force" a simple list.parallelStream() to use work-stealing? (My current workaround is to attempt to balance the list based on various heurestics so that each thread sees, usually, the same amount of work; but, it's hard to predict that....)

推荐答案

这与JIT编译器无关,但与Stream API的实现无关.它将工作负载分成多个块,这些块由工作线程按顺序处理.一般的策略是要拥有比工作线程更多的作业来启用工作窃取,例如,参见

This has nothing to do with the JIT compiler but with the implementation of the Stream API. It will divide the workload into chunks which are the processed sequentially by the worker threads. The general strategy is to have more jobs than worker threads to enable work-stealing, see for example ForkJoinTask.getSurplusQueuedTaskCount(), which can be used to implement such an adaptive strategy.

以下代码可用于检测源为ArrayList时按顺序处理了多少个元素:

The following code can be used to detect how many elements were processed sequentially when the source is an ArrayList:

List<Object> list = new ArrayList<>(Collections.nCopies(10_000, ""));
System.out.println(System.getProperty("java.version"));
System.out.println(Runtime.getRuntime().availableProcessors());
System.out.println( list.parallelStream()
    .collect(
        () -> new ArrayList<>(Collections.singleton(0)),
        (l,x) -> l.replaceAll(i -> i + 1),
        List::addAll) );

在我当前的测试机上,它会打印:

On my current test machine, it prints:

1.8.0_60
4
[625, 625, 625, 625, 625, 625, 625, 625, 625, 625, 625, 625, 625, 625, 625, 625]

因此,存在更多的块而不是内核,以允许窃取工作.但是,一旦开始对块进行顺序处理,就无法进一步拆分,因此当每个元素的执行时间显着不同时,此实现会受到限制.这始终是一个权衡.

So there are more chunks than cores, to allow work-stealing. However, once the sequential processing of a chunk has started, it can’t be split further, so this implementation has limitations when the per-element execution times differ significantly. This is always a trade-off.

这篇关于Java 8流中的一个简单的list.parallelStream()似乎无法进行窃取工作?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆