编译指示用于内部编译指示主或单个 [英] pragma omp for inside pragma omp master or single

查看:69
本文介绍了编译指示用于内部编译指示主或单个的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在这里坐着一些东西,试图使孤立工作正常进行,并通过减少对#pragma omp parallel的调用来减少开销. 我正在尝试的是类似的东西:

I'm sitting with some stuff here trying to make orphaning work, and reduce the overhead by reducing the calls of #pragma omp parallel. What I'm trying is something like:

#pragma omp parallel default(none) shared(mat,mat2,f,max_iter,tol,N,conv) private(diff,k)
{
#pragma omp master // I'm not against using #pragma omp single or whatever will work
{
while(diff>tol) {
    do_work(mat,mat2,f,N);
    swap(mat,mat2);
    if( !(k%100) ) // Only test stop criteria every 100 iteration
         diff = conv[k] = do_more_work(mat,mat2);
    k++;
} // end while
} // end master
} // end parallel

do_work取决于先前的迭代,因此while循环必须顺序运行. 但是我希望能够并行运行"do_work",因此它看起来像:

The do_work depends on the previous iteration so the while-loop is has to be run sequential. But I would like to be able to run the ´do_work´ parallel, so it would look something like:

void do_work(double *mat, double *mat2, double *f, int N)
{
int i,j;
double scale = 1/4.0;
#pragma omp for schedule(runtime) // Just so I can test different settings without having to recompile
for(i=0;i<N;i++)
    for(j=0;j<N;j++)
         mat[i*N+j] = scale*(mat2[(i+1)*N+j]+mat2[(i-1)*N+j] + ... + f[i*N+j]);
} 

我希望这可以通过某种方式完成,但我不确定如何做到.因此,我能获得的任何帮助都将不胜感激(如果您告诉我这是不可能的话).顺便说一句,我正在使用Open mp 3.0,gcc编译器和sun studio编译器.

I hope this can be accomplished some way, I'm just not sure how. So any help I can get is greatly appreciated (also if you're telling me this isn't possible). Btw I'm working with open mp 3.0, the gcc compiler and the sun studio compiler.

推荐答案

原始代码中的外部并行区域仅包含一个串行段(#pragma omp master),这没有任何意义,并有效地导致了纯串行执行(无并行性) ).由于do_work()取决于先前的迭代,但是您希望并行运行它,因此必须使用同步.用于该目的的openmp工具是一个(显式或隐式)同步屏障.

The outer parallel region in your original code contains only a serial piece (#pragma omp master), which makes no sense and effectively results in purely serial execution (no parallelism). As do_work() depends on the previous iteration, but you want to run it in parallel, you must use synchronisation. The openmp tool for that is an (explicit or implicit) synchronisation barrier.

例如(与您的代码相似的代码):

For example (code similar to yours):

#pragma omp parallel
for(int j=0; diff>tol; ++j)    // must be the same condition for each thread!
#pragma omp for                // note: implicit synchronisation after for loop
  for(int i=0; i<N; ++i)
    work(j,i);

请注意,如果任何线程仍在当前j上运行,则隐式同步可确保没有线程进入下一个j.

Note that the implicit synchronisation ensures that no thread enters the next j if any thread is still working on the current j.

替代方案

for(int j=0; diff>tol; ++j)
#pragma omp parallel for
  for(int i=0; i<N; ++i)
    work(j,i);

应该效率较低,因为它会在每次迭代时创建一个新的线程组,而不仅仅是同步.

should be less efficient, as it creates a new team of threads at each iteration, instead of merely synchronising.

这篇关于编译指示用于内部编译指示主或单个的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆