Java 8并行流似乎并不实际并行工作 [英] Java 8 parallel streams don't appear to actually be working in parallel

查看:213
本文介绍了Java 8并行流似乎并不实际并行工作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用Java 8的parallelStream()并行执行多个长时间运行的请求(例如Web请求)。简化示例:

I'm trying to use Java 8's parallelStream() to execute several long-running requests (eg web requests) in parallel. Simplified example:

List<Supplier<Result>> myFunctions = Arrays.asList(() -> doWebRequest(), ...)

List<Result> results = myFunctions.parallelStream().map(function -> function.get()).collect(...

因此,如果有两个函数分别阻塞2秒和3秒,我希望在3秒后得到结果。但是,它确实需要5秒 - 即看起来函数正在按顺序执行我做错了什么?

So if there are two functions that block for 2 and 3 seconds respectively, I'd expect to get the result after 3 seconds. However, it really takes 5 seconds - ie it seems the functions are being executed in sequence and not in parallel. Am I doing something wrong?

编辑:这是一个例子。当我希望它为~2000时,花费的时间是~4000毫秒。 / p>

edit: This is an example. The time taken is ~4000 milliseconds when I want it to be ~2000.

    long start = System.currentTimeMillis();

    Map<String, Supplier<String>> input = new HashMap<String, Supplier<String>>();

    input.put("1", () -> {
        try {
            Thread.sleep(2000);
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
        return "a";
    });

    input.put("2", () -> {
        try {
            Thread.sleep(2000);
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
        return "b";
    });

    Map<String, String> results = input.keySet().parallelStream().collect(Collectors.toConcurrentMap(
            key -> key,
            key -> {
                return input.get(key).get();
            }));

    System.out.println("Time: " + (System.currentTimeMillis() - start));

}

如果我遍历entrySet没有任何区别()而不是keySet()

Doesn't make any difference if I iterate over the entrySet() instead of the keySet()

编辑:将并行部分更改为以下内容也无济于事:

edit: changing the parallel part to the following also does not help:

 Map<String, String> results = input.entrySet().parallelStream().map(entry -> {
            return new ImmutablePair<String, String>(entry.getKey(), entry.getValue().get());
    }).collect(Collectors.toConcurrentMap(Pair::getLeft, Pair::getRight));


推荐答案

并行执行时,分解的开销很大输入集,创建任务来表示计算的不同部分,跨线程分配操作,等待结果,组合结果等。这超出了实际解决问题的工作。如果并行框架总是将问题分解为一个元素的粒度,对于大多数问题,这些开销将压倒实际计算,并行性将导致执行速度变慢。因此并行框架有一定的自由度来决定分解输入的精确度,这就是这里发生的事情。

When executing in parallel, there is overhead of decomposing the input set, creating tasks to represent the different portions of the calculation, distributing the actions across threads, waiting for results, combining results, etc. This is over and above the work of actually solving the problem. If a parallel framework were to always decompose problems down to a granularity of one element, for most problems, these overheads would overwhelm the actual computation and parallelism would result in a slower execution. So parallel frameworks have some latitude to decide how finely to decompose the input, and that's what's happening here.

在您的情况下,您的输入集太小而无法分解。因此库选择按顺序执行。

In your case, your input set is simply too small to be decomposed. So the library chooses to execute sequentially.

在四核系统上试试这个:比较

Try this on your four-core system: compare

IntStream.range(0, 100_000).sum()

vs

IntStream.range(0, 100_000).parallel().sum()

在这里,你给它足够的输入,它将确信它可以通过并行执行获胜。如果使用负责任的测量方法(例如,JMH微基准线束)进行测量,您可能会在这两个示例之间看到几乎线性的加速。

Here, you're giving it enough input that it will be confident it can win through parallel execution. If you measure with a responsible measurement methodology (say, the JMH microbenchmark harness), you'll probably see an almost-linear speedup between these two examples.

这篇关于Java 8并行流似乎并不实际并行工作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆