对OpenMP中的静态调度开销的影响 [英] Influence on the static scheduling overhead in OpenMP

查看：192 发布时间：2016/10/14 22:28:16 c++ openmp scheduling overhead

本文介绍了对OpenMP中的静态调度开销的影响的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想到了哪些因素会影响OpenMP中的静态调度开销。
在我看来，它受以下影响：

I thought about which factors would influence the static scheduling overhead in OpenMP. In my opinion it is influenced by:

CPU性能

OpenMP运行时库

线程数

但我缺少其他因素？也许任务的大小，...？

But am I missing further factors? Maybe the size of the tasks, ...?

此外：开销是线性依赖于迭代次数吗？
在这种情况下，我希望有静态调度和4核心，开销与4 * i迭代线性增加。正确到目前为止？

And furthermore: Is the overhead linearly dependent on the number of iterations? In this case I would expect that having static scheduling and 4 cores, the overhead increases linearly with 4*i iterations. Correct so far?

编辑：
我只对静态（！）调度开销本身感兴趣。

I am only interested in the static (!) scheduling overhead itself. I am not talking about thread start-up overhead and time spent in synchronisation and critical section overhead.

推荐答案

您需要将OpenMP创建一个线程组/池的开销，以及每个线程在for循环中操作单独的迭代器集的开销。

You need to separate the overhead for OpenMP to create a team/pool of threads and the overhead for each thread to operate separate sets of iterators in a for loop.

静态调度很容易手工实现（有时非常有用）。让我们考虑一下我认为两个最重要的静态调度 schedule（static）和 schedule（static，1）可以比较 schedule（dynamic，chunk）。

Static scheduling is easy to implement by hand (which is sometimes very useful). Let's consider what I consider the two most important static scheduling schedule(static) and schedule(static,1) then we can compare this to schedule(dynamic,chunk).

#pragma omp parallel for schedule(static)
for(int i=0; i<N; i++) foo(i);

相当于（但不一定等于）

is equivalent to (but not necessarily equal to)

#pragma omp parallel
{
    int start = omp_get_thread_num()*N/omp_get_num_threads();
    int finish = (omp_get_thread_num()+1)*N/omp_get_num_threads();
    for(int i=start; i<finish; i++) foo(i);
}

和

#pragma omp parallel for schedule(static,1)
for(int i=0; i<N; i++) foo(i);

相当于

#pragma omp parallel 
{
    int ithread = omp_get_thread_num();
    int nthreads = omp_get_num_threads();
    for(int i=ithread; i<N; i+=nthreads) foo(i);
}

从这里你可以看出，实现静态调度是非常简单的，所以

From this you can see that it's quite trivial to implement static scheduling and so the overhead is negligible.

另一方面，如果你想实现 schedule（dynamic）因为 schedule（dynamic，1））的用法比较复杂：

On the other hand if you want to implement schedule(dynamic) (which is the same as schedule(dynamic,1)) by hand it's more complicated:

int cnt = 0;
#pragma omp parallel
for(int i=0;;) {
    #pragma omp atomic capture
    i = cnt++;
    if(i>=N) break;
    foo(i);                                    
}

这需要OpenMP> = 3.1。如果你想使用OpenMP 2.0（对于MSVC），你需要使用临界类似这样

This requires OpenMP >=3.1. If you wanted to do this with OpenMP 2.0 (for MSVC) you would need to use critical like this

int cnt = 0;
#pragma omp parallel
for(int i=0;;) {
    #pragma omp critical   
    i = cnt++;
    if(i>=N) break;
    foo(i);
}

这相当于 ）（我没有使用原子访问优化它）：

Here is an equivalent to schedule(dynamic,chunk) (I have not optimized this using atomic accesss):

int cnt = 0;
int chunk = 5;
#pragma omp parallel
{
    int start, finish;
    do {
        #pragma omp critical
        {
            start = cnt;
            finish = cnt+chunk < N ? cnt+chunk : N;
            cnt += chunk;
        }
        for(int i=start; i<finish; i++) foo(i);
    } while(finish<N);
}

清楚地使用原子访问会导致更多的开销。这也说明了为什么为 schedule（dynamic，chunk）使用较大的块可以减少开销。

Clearly using atomic accesses is going to cause more overhead. This also shows why using larger chunks for schedule(dynamic,chunk) can reduce the overhead.

这篇关于对OpenMP中的静态调度开销的影响的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

对OpenMP中的静态调度开销的影响 [英] Influence on the static scheduling overhead in OpenMP

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录关闭

对OpenMP中的静态调度开销的影响 [英] Influence on the static scheduling overhead in OpenMP

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录 关闭

登录关闭