使用OpenMP并行化C ++代码,计算实际上并行速度较慢 [英] Parallelizing C++ code using OpenMP, calculations actually slower in parallel

查看:1189
本文介绍了使用OpenMP并行化C ++代码,计算实际上并行速度较慢的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下代码,我想并行化:

I have the following code that I want to parallelize:

int ncip( int dim, double R)
{   
    int i;
    int r = (int)floor(R);
    if (dim == 1)
    {   
        return 1 + 2*r; 
    }
    int n = ncip(dim-1, R); // last coord 0

    #pragma omp parallel for
    for(i=1; i<=r; ++i)
    {   
        n += 2*ncip(dim-1, sqrt(R*R - i*i) ); // last coord +- i
    }

    return n;
}

在没有openmp的情况下运行的程序执行时间是6.956s for循环我的执行时间大于3分钟(这是因为我自己结束)。

The program execution time when ran without openmp is 6.956s when I try and parallelize the for loop my execution time is greater than 3 minutes (and that's because I ended it myself). What am I doing wrong in regards to parallelizing this code ?

第二次尝试

    int ncip( int dim, double R)
{   
int i;
int r = (int)floor( R);
if ( dim == 1)
{   return 1 + 2*r; 
}


#pragma omp parallel 
{
int n = ncip( dim-1, R); // last coord 0
#pragma omp for reduction (+:n)
for( i=1; i<=r; ++i)
{   
    n += 2*ncip( dim-1, sqrt( R*R - i*i) ); // last coord +- i
}

}

return n;

}


推荐答案

做错了!

(1)在变量 n 中有数据竞争。如果要并行化在同一内存区域中写入的代码,则必须使用 reduce (在for中),原子关键避免数据危险。

(1) There are data races in variable n. If you want to parallelize a code that have writes in the same memory zone, you must use the reduction (in the for), atomic or critical to avoid data hazards.

(2)可能你启用了嵌套并行,所以程序每次调用函数 ncip 。应该是这个主要问题。对于递归函数,我建议你只创建一个并行区域,然后使用 pragma omp任务

(2) Probably you have the nested parallelism enabled, so the program is creating a new parallel zone every time you call the function ncip. Should be this the main problem. For recursive functions I advise you to create just one parallel zone and then use the pragma omp task.

不要将 #pragma omp并行化为,并尝试使用 #pragma omp任务。看这个例子:

Do not parallelize with #pragma omp for and try with the #pragma omp task. Look this example:

int ncip(int dim, double R){
    ...
    #pragma omp task
    ncip(XX, XX);

    #pragma omp taskwait
    ...
}

int main(int argc, char *argv[]) {
    #pragma omp parallel
    {
        #pragma omp single 
        ncip(XX, XX);
    } 
    return(0); 
}






//Detailed version (without omp for and data races)
int ncip(int dim, double R){
    int n, r = (int)floor(R);

    if (dim == 1) return 1 + 2*r; 

    n = ncip(dim-1, R); // last coord 0

    for(int i=1; i<=r; ++i){   
        #pragma omp task
        {
            int aux = 2*ncip(dim-1, sqrt(R*R - i*i) ); // last coord +- i

            #pragma omp atomic
            n += aux;
        }
    }
    #pragma omp taskwait
    return n;
}



PS:你不会从这里得到加速,任务比单个任务的工作大。你可以做的最好的事情是重写这个算法到迭代版本,然后尝试并行化。

PS: You'll not get a speedup from this, because overhead to creat a task is bigger than the work of a single task. The best thing you can do is re-write this algorithm to an iterative version, and then try to parallelize it.

这篇关于使用OpenMP并行化C ++代码,计算实际上并行速度较慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆