OpenMP的:不能嵌套并行for循环 [英] OpenMP: Can't parallelize nested for loops

查看:1446
本文介绍了OpenMP的:不能嵌套并行for循环的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我要与并行在它内部循环回路。我的code是这样的:

I want to parallelize loop with inner loop within it. My Code looks like this:

    #pragma omp parallel for private(jb,ib) shared(n, Nb, lb, lastBlock, jj, W, WT) schedule(dynamic)   //private(ib, jb) shared(n, Nb, lb, lastBlock, jj, W, WT)       //parallel for loop with omp
    for(jb=0; jb<Nb; jb++)          
    {
            int lbh = (jb==Nb-1) ? lastBlock : lb;
            int ip = omp_get_thread_num();

            packWT(a, n, lb, s, jb, colNr, WT[ip], nr); //pack WWT[jb]      


            for(ib=jb; ib<Nb; ib++)
            {
                    int lbv = (ib==Nb-1) ? lastBlock : lb;

                    multBlock_2x4xk(a, n, jj + ib*lb, jj + jb*lb, W+ib*lb*lb, WT[ip], lb, lbv, lbh);    //MULT BLOCK - 2x4xK (W[jb]*W[ib])

            }
    }

我衡量其PROC花在计算这个循环的时间。这是几个线程为一个线程是一样的。当我更改条款

I measure time which proc spent on calculating this loops. It is the same for few threads as for one thread. When I change clause

private(jb,ib)

private(jb)

一切被改变。我的意思是几个线程PROC是计算比一个线程更快。有什么问题?

Everything is being changed. I mean for few threads proc is calculating faster than for one thread. What is the problem?

推荐答案

的问题是,你的内心for循环不规范的形状。因此失败的OpenMP并行循环和可实现无加速。该循环需要看起来像下面的图片。从哪里开始,IDX和INC不准code的并行部分中被改变。

The problem is that your inner for loops is not in canonical shape. Therefore openmp fails to parallelize the loops and no speedup can be achieved. The loops need to look like the following picture. Where start, idx and inc are not allowed to be changed during the parallel part of the code.

我想我发现你的问题。您调用这些功能:

I think I identified your problem. You are calling these function:

  packWT(a, n, lb, s, jb, colNr, WT[ip], nr); packWT(a, n, lb, s, jb, colNr, WT[ip], nr);
  multBlock_2x4xk(a, n, jj + ib*lb, jj + jb*lb, W+ib*lb*lb, WT[ip], lb, lbv, lbh);

,其中一种说法是循环变量JB,JB作为可以在函数(取决于函数声明)里面被改变,编译器决定不并行化循环。
为了避免这种复制您的变量JB一个局部变量和手的局部变量的功能。

where one argument is the loop variable jb, as jb can be changed inside the function (depending on the function declaration), the compiler decides not to parallelize the loop. To avoid this copy your variable jb to a local variable and hand the local variable to the function.

这篇关于OpenMP的:不能嵌套并行for循环的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆