如何确定fork-join任务的正确工作划分阈值 [英] How to determine the proper work division threshold of a fork-join task

查看:707
本文介绍了如何确定fork-join任务的正确工作划分阈值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

查看 Fork / Join教程后,我创建了用于计算大因子的类:

After looking the Fork/Join Tutorial, I created a class for computing large factorials:

public class ForkFactorial extends RecursiveTask<BigInteger> {

    final int end;
    final int start;
    private static final int THRESHOLD = 10;

    public ForkFactorial(int n) {
        this(1, n + 1);
    }

    private ForkFactorial(int start, int end) {
        this.start = start;
        this.end = end;
    }

    @Override
    protected BigInteger compute() {
        if (end - start < THRESHOLD) {
            return computeDirectly();
        } else {
            int mid = (start + end) / 2;
            ForkFactorial lower = new ForkFactorial(start, mid);
            lower.fork();
            ForkFactorial upper = new ForkFactorial(mid, end);
            BigInteger upperVal = upper.compute();
            return lower.join().multiply(upperVal);
        }
    }

    private BigInteger computeDirectly() {
        BigInteger val = BigInteger.ONE;
        BigInteger mult = BigInteger.valueOf(start);
        for (int iter = start; iter < end; iter++, mult = mult.add(BigInteger.ONE)) {
            val = val.multiply(mult);
        }
        return val;
    }
}

我的问题是如何确定阈值我细分任务?我发现了分叉/并行并行页面,其中指出:

The question I have is how to determine the threshold for which I subdivide the task? I found a page on fork/join parallelism which states:


使用fork / join并行性实现算法
时要考虑的主要事情之一是选择决定$ b的阈值$ b是否任务将执行顺序计算而不是
分叉并行子任务。

One of the main things to consider when implementing an algorithm using fork/join parallelism is chosing the threshold which determines whether a task will execute a sequential computation rather than forking parallel sub-tasks.

如果阈值太大,则程序可能无法创建
足够的任务,以充分利用可用的
处理器/内核。

If the threshold is too large, then the program might not create enough tasks to fully take advantage of the available processors/cores.

如果阈值太小,

一般来说,一些实验需要找到一个
的适当阈值。

In general, some experimentation will be necessary to find an appropriate threshold value.

那么我需要做什么实验来确定阈值呢?

So what experimentation would I need to do in order to determine the threshold?

推荐答案

选择阈值取决于许多因素:

Choosing a threshold depends on many factors:

实际计算需要合理的时间。如果你对一个数组求和,数组很小,那么最好是顺序执行。如果数组长度为16M,那么将其拆分为更小的数组并进行并行处理应该是值得的。试试看看。

The actual computation should take a reasonable amount of time. If you're summing an array and the array is small then it is probably better to do it sequentially. If the array length is 16M, then splitting it into smaller pieces and parallel processing should be worthwhile. Try it and see.

处理器的数量应该足够了。 Doug Lea曾经用16个以上的处理器记录了他的框架,使其值得。甚至将数组分成两半并在两个线程上运行,将产生大约1.3%的吞吐量增益。现在你必须考虑拆分/连接开销。尝试在许多配置上运行以查看您获得的结果。

The number of processors should be sufficient. Doug Lea once documented his framework with the number 16+ processors to make it worthwhile. Even splitting an array in half and running on two threads will produce about a 1.3% gain in throughput. Now you have to consider the split/join overhead. Try running on many configurations to see what you get.

并发请求数应该很小。如果有N个处理器和8(N)个并发请求,则每个请求使用一个线程通常对吞吐量更有效。这里的逻辑很简单。如果你有N个处理器可用,并且你相应地拆分了你的工作,但还有数百个其他任务提前,那么分裂的要点是什么?

The number of concurrent requests should be small. If you have N processors and 8(N) concurrent requests, then using one thread per request is often more efficient for throughput. The logic here is simple. If you have N processors available and you split your work accordingly but there are hundreds of other tasks ahead of you, then what's the point of splitting?

这是试验的意思。

不幸的是,这个框架并没有提供问责的方法。没有办法看每个线程的负载。在deques高水位。处理的总请求数。遇到的错误等。

Unfortunately, this framework doesn't come with the means for accountability. There is no way to see the load on each thread. The high water mark in deques. Total requests processed. Errors encountered, etc.

祝你好运。

这篇关于如何确定fork-join任务的正确工作划分阈值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆