使用OpenMP并行化C ++代码,计算实际上并行速度较慢 [英] Parallelizing C++ code using OpenMP, calculations actually slower in parallel
问题描述
我有以下代码,我想并行化:
I have the following code that I want to parallelize:
int ncip( int dim, double R)
{
int i;
int r = (int)floor(R);
if (dim == 1)
{
return 1 + 2*r;
}
int n = ncip(dim-1, R); // last coord 0
#pragma omp parallel for
for(i=1; i<=r; ++i)
{
n += 2*ncip(dim-1, sqrt(R*R - i*i) ); // last coord +- i
}
return n;
}
在没有openmp的情况下运行的程序执行时间是6.956s for循环我的执行时间大于3分钟(这是因为我自己结束)。
The program execution time when ran without openmp is 6.956s when I try and parallelize the for loop my execution time is greater than 3 minutes (and that's because I ended it myself). What am I doing wrong in regards to parallelizing this code ?
第二次尝试
int ncip( int dim, double R)
{
int i;
int r = (int)floor( R);
if ( dim == 1)
{ return 1 + 2*r;
}
#pragma omp parallel
{
int n = ncip( dim-1, R); // last coord 0
#pragma omp for reduction (+:n)
for( i=1; i<=r; ++i)
{
n += 2*ncip( dim-1, sqrt( R*R - i*i) ); // last coord +- i
}
}
return n;
}
推荐答案
做错了!
(1)在变量 n
中有数据竞争。如果要并行化在同一内存区域中写入的代码,则必须使用 reduce (在for中),原子或关键避免数据危险。
(1) There are data races in variable n
. If you want to parallelize a code that have writes in the same memory zone, you must use the reduction (in the for), atomic or critical to avoid data hazards.
(2)可能你启用了嵌套并行,所以程序每次调用函数 ncip
。应该是这个主要问题。对于递归函数,我建议你只创建一个并行区域,然后使用 pragma omp任务
。
(2) Probably you have the nested parallelism enabled, so the program is creating a new parallel zone every time you call the function ncip
. Should be this the main problem. For recursive functions I advise you to create just one parallel zone and then use the pragma omp task
.
不要将 #pragma omp并行化为
,并尝试使用 #pragma omp任务
。看这个例子:
Do not parallelize with #pragma omp for
and try with the #pragma omp task
. Look this example:
int ncip(int dim, double R){
...
#pragma omp task
ncip(XX, XX);
#pragma omp taskwait
...
}
int main(int argc, char *argv[]) {
#pragma omp parallel
{
#pragma omp single
ncip(XX, XX);
}
return(0);
}
//Detailed version (without omp for and data races)
int ncip(int dim, double R){
int n, r = (int)floor(R);
if (dim == 1) return 1 + 2*r;
n = ncip(dim-1, R); // last coord 0
for(int i=1; i<=r; ++i){
#pragma omp task
{
int aux = 2*ncip(dim-1, sqrt(R*R - i*i) ); // last coord +- i
#pragma omp atomic
n += aux;
}
}
#pragma omp taskwait
return n;
}
PS:你不会从这里得到加速,任务比单个任务的工作大。你可以做的最好的事情是重写这个算法到迭代版本,然后尝试并行化。
PS: You'll not get a speedup from this, because overhead to creat a task is bigger than the work of a single task. The best thing you can do is re-write this algorithm to an iterative version, and then try to parallelize it.
这篇关于使用OpenMP并行化C ++代码,计算实际上并行速度较慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!