如何正确并行嵌套的for循环 [英] How to parallelize correctly a nested for loops
问题描述
我使用OpenMP工作并行嵌套循环标:
I'm working with OpenMP to parallelize a scalar nested for loop:
double P[N][N];
double x=0.0,y=0.0;
for (int i=0; i<N; i++)
{
for (int j=0; j<N; j++)
{
P[i][j]=someLongFunction(x,y);
y+=1;
}
x+=1;
}
在此循环中的重要的事情是,矩阵P必须是在标量和并行版本是相同的:
In this loop the important thing is that matrix P must be the same in both scalar and parallel versions:
我所有可能的试验没有成功...
All my possible trials didn't succeed...
推荐答案
这里的问题是,你已经添加迭代到迭代依赖性有:
The problem here is that you have added iteration-to-iteration dependencies with:
x+=1;
y+=1;
因此,作为code现在表示,它不是可并行化的。试图这样做会导致不正确的结果。 (因为你很可能看到)
Therefore, as the code stands right now, it is not parallelizable. Attempting to do so will result in incorrect results. (as you are probably seeing)
幸运的是,你的情况,你可以直接计算出它们不会引入这种依赖性:
Fortunately, in your case, you can directly compute them without introducing this dependency:
for (int i=0; i<N; i++)
{
for (int j=0; j<N; j++)
{
P[i][j]=someLongFunction((double)i, (double)N*i + j);
}
}
现在你可以试着在这个抛出一个OpenMP的编译,看看它的工作原理:
Now you can try throwing an OpenMP pragma over this and see if it works:
#pragma omp parallel for
for (int i=0; i<N; i++)
{
for (int j=0; j<N; j++)
{
P[i][j]=someLongFunction((double)i, (double)N*i + j);
}
}
这篇关于如何正确并行嵌套的for循环的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!