Java Fork-Join(Java 8)中出现意外的可伸缩性 [英] Unexpected Scalability results in Java Fork-Join (Java 8)

查看:131
本文介绍了Java Fork-Join(Java 8)中出现意外的可伸缩性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

最近,我使用Java Fork-Join运行了一些可扩展性实验。在这里,我使用了非默认的ForkJoinPool构造函数 ForkJoinPool(int parallelism),将所需的并行性(#workers)作为构造函数参数传递。

Recently, I was running some scalability experiments using Java Fork-Join. Here, I used the non-default ForkJoinPool constructor ForkJoinPool(int parallelism), passing the desired parallelism (# workers) as constructor argument.

具体来说,使用以下代码:

Specifically, using the following piece of code:

public static void main(String[] args) throws InterruptedException {
    ForkJoinPool pool = new ForkJoinPool(Integer.parseInt(args[0]));
    pool.invoke(new ParallelLoopTask());    
}

static class ParallelLoopTask extends RecursiveAction {

    final int n = 1000;

    @Override
    protected void compute() {
        RecursiveAction[] T = new RecursiveAction[n];
        for(int p = 0; p < n; p++){
            T[p] = new DummyTask();
            T[p].fork();
        }
        for(int p = 0; p < n; p++){
            T[p].join();
        }
        /*
        //The problem does not occur when tasks are joined in the reverse order, i.e.
        for(int p = n-1; p >= 0; p--){
            T[p].join();
        }
        */
    }
}


static public class DummyTask extends RecursiveAction {
    //performs some dummy work

    final int N = 10000000;

    //avoid memory bus contention by restricting access to cache (which is distributed)
    double val = 1;

    @Override
    protected void compute() {
        for(int j = 0; j < N; j++){
            if(val < 11){
                val *= 1.1;
            }else{
                val = 1;
            }
        }
    }
}

I在具有4个物理核心和8个逻辑核心的处理器上获得这些结果(使用java 8:jre1.8.0_45):

I got these results on a processor with 4 physical and 8 logical cores (Using java 8: jre1.8.0_45):

T1:11730

T2:2381(加速:4,93)

T2: 2381 (speedup: 4,93)

T4:2463(加速:4,76)

T4: 2463 (speedup: 4,76)

T8:2418(加速:4,85)

T8: 2418 (speedup: 4,85)

当使用java 7(jre1.7.0)时,我得到了

While when using java 7 (jre1.7.0), I get

T1:11938

T2:11843(加速:1,01)

T2: 11843 (speedup: 1,01)

T4:5133(加速:2,33)

T4: 5133 (speedup: 2,33)

T8:2607(加速:4,58)

T8: 2607 (speedup: 4,58)

(其中TP是以ms为单位的执行时间,使用并​​行度级别P)

(where TP is the execution time in ms, using parallelism level P)

虽然两个结果让我感到惊讶,但后者我能理解(连接将导致1 worker(执行循环)阻塞,因为它无法识别它在等待时可以从其本地队列处理其他挂起的虚拟任务)。然而,前者让我感到困惑。

While both results surprise me, the latter I can understand (the join will cause 1 worker (executing the loop) to block, as it fails to recognize that it could, while waiting, process other pending dummy tasks from its local queue). The former, however, got me puzzled.

BTW:在计算已启动但尚未完成的虚拟任务的数量时,我发现在某个池中存在多达24个这样的任务,并且在某个点上存在并行性2时间......?

BTW: When counting the number of started, but not yet completed dummy tasks, I found that up to 24 such tasks existed in a pool with parallelism 2 at some point in time...?

编辑:

我使用上面的应用程序进行基准测试JMH(jdk1.8.0_45)
(选项-bm avgt -f 1)(= 1分叉,20 + 20次迭代)
结果如下

I benchmarked the application above using JMH (jdk1.8.0_45) (options -bm avgt -f 1) (= 1 fork, 20+20 iterations) The results below

T1:11,664

T1: 11,664

11,664 ±(99.9%) 0,044 s/op [Average]
(min, avg, max) = (11,597, 11,664, 11,810), stdev = 0,050
CI (99.9%): [11,620, 11,708] (assumes normal distribution)

T2:4,134(加速:2,82)

T2: 4,134 (speedup: 2,82)

4,134 ±(99.9%) 0,787 s/op [Average]
(min, avg, max) = (3,045, 4,134, 5,376), stdev = 0,906
CI (99.9%): [3,348, 4,921] (assumes normal distribution)

T4:2,972(加速:3,92)

T4: 2,972 (speedup: 3,92)

2,972 ±(99.9%) 0,212 s/op [Average]
(min, avg, max) = (2,375, 2,972, 3,200), stdev = 0,245
CI (99.9%): [2,759, 3,184] (assumes normal distribution)

T8:2,845(加速:4,10)

T8: 2,845 (speedup: 4,10)

2,845 ±(99.9%) 0,306 s/op [Average]
(min, avg, max) = (2,277, 2,845, 3,310), stdev = 0,352
CI (99.9%): [2,540, 3,151] (assumes normal distribution)

乍一看,人们会认为这些可扩展性结果更接近于预期的结果,即T1 < T2< T4~T8。
然而,仍然让我感到困惑的是:

At first sight one would think these scalability results are closer to what one would expect i.e. T1 < T2 < T4 ~ T8. However, what still bugs me is the following:


  1. java 7和8之间T2的差异。我猜一个解释
    将是执行并行循环的worker在java 8中没有空闲,而是找到其他工作要执行。

  2. 超线性加速(3x)和2工作人员。另外,请注意T2
    似乎随着每次迭代而增加(见下文,请注意这也是
    的情况,尽管P = 4,8的情况较小)。
    中第一次预热迭代的次数与上面提到的
    相似。也许预热时间应该更长,但是,执行时间增加并不奇怪,即我宁愿期望它减少吗?

  3. 最后,我仍然发现还有更多
    开始&没有完成虚拟任务而不是工作线程好奇。

>

Run progress: 0,00% complete, ETA 00:00:40
Fork: 1 of 1
Warmup Iteration   1: 2,365 s/op
Warmup Iteration   2: 2,341 s/op
Warmup Iteration   3: 2,393 s/op
Warmup Iteration   4: 2,323 s/op
Warmup Iteration   5: 2,925 s/op
Warmup Iteration   6: 3,040 s/op
Warmup Iteration   7: 2,304 s/op
Warmup Iteration   8: 2,347 s/op
Warmup Iteration   9: 2,939 s/op
Warmup Iteration  10: 3,083 s/op
Warmup Iteration  11: 3,004 s/op
Warmup Iteration  12: 2,327 s/op
Warmup Iteration  13: 3,083 s/op
Warmup Iteration  14: 3,229 s/op
Warmup Iteration  15: 3,076 s/op
Warmup Iteration  16: 2,325 s/op
Warmup Iteration  17: 2,993 s/op
Warmup Iteration  18: 3,112 s/op
Warmup Iteration  19: 3,074 s/op
Warmup Iteration  20: 2,354 s/op
Iteration   1: 3,045 s/op
Iteration   2: 3,094 s/op
Iteration   3: 3,113 s/op
Iteration   4: 3,057 s/op
Iteration   5: 3,050 s/op
Iteration   6: 3,106 s/op
Iteration   7: 3,080 s/op
Iteration   8: 3,370 s/op
Iteration   9: 4,482 s/op
Iteration  10: 4,325 s/op
Iteration  11: 5,002 s/op
Iteration  12: 4,980 s/op
Iteration  13: 5,121 s/op
Iteration  14: 4,310 s/op
Iteration  15: 5,146 s/op
Iteration  16: 5,376 s/op
Iteration  17: 4,810 s/op
Iteration  18: 4,320 s/op
Iteration  19: 5,249 s/op
Iteration  20: 4,654 s/op


推荐答案

您的示例中没有任何关于您如何进行此基准测试的内容。看起来你只是在跑步的开始和结束时做了一个毫秒的时间。这不准确。我建议你看看这个 SO回答并重新发布您的时间。顺便说一句,jmh基准测试将成为Java9的标准,这就是你应该使用的标准。

There is nothing in your example of how you did this benchmark. It looks like you just did a milli-time at the beginning and end of the run. This is not accurate. I suggest you take a look at this SO answer and re-post your timings. BTW the jmh benchmark is going to be the standard in Java9 so that is what you should be using.

编辑:

您承认可伸缩性结果符合您的预期。但是你说你对结果仍然不满意。现在是时候查看代码了。

You admit that the scalability results are what you expected. But you say you’re still not happy with the results. Now it’s time to look inside the code.

这个框架存在严重问题。自2010年以来,我一直在撰写批评。正如我指出的那样这里,加入不起作用。作者尝试了各种方法来解决问题,但问题仍然存在。

There are serious problems with this framework. I’ve been writing a critique about it since 2010. As I point out here , join doesn’t work. The author has tried various means to get around the problem but the problem persists.

将运行时间增加到大约一分钟,(n = 100000000)或在compute()中加入一些繁重的计算。现在在VisualVM或其他分析器中分析应用程序。这将显示停滞线程,过多线程等。

Increase your run time to about a minute, (n=100000000) or put some heavy computations in the compute(). Now profile the application in VisualVM or another profiler. This will show you the stalling threads, excessive threads, etc.

如果这对您的问题没有帮助,那么您应该使用调试器查看代码流。分析/代码分析是您获得满意答案的唯一方法。

If that doesn’t help answer your questions than you should look at the code flow using a debugger. Profiling/code analysis is the only way you are going to get satisfactory answers to your questions.

这篇关于Java Fork-Join(Java 8)中出现意外的可伸缩性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆