顺序和并行版本会产生不同的结果 - 为什么? [英] Sequential and parallel versions give different results - Why?
问题描述
我有一个嵌套循环:(L和A是完全定义的输入)
#pragma omp parallel for schedule )对于(i = k + 1; i
for(n = 0; n <= 1),共享(L,A)\
减少(+:dummy) k; n ++){
#pragma omp atomic
dummy + = L [i] [n] * L [k] [n];
L [i] [k] =(A [i] [k] - dummy)/ L [k] [k];
}
dummy = 0;
}
及其顺序版本:
$ b $对于(n = 0; n
+ = L [i] [n] * L [k] [n];
L [i] [k] =(A [i] [k] - dummy)/ L [k] [k];
}
dummy = 0;
}
他们都给出了不同的结果。并行版本比顺序版本慢得多。
什么可能导致问题?
编辑:
为了消除由atomic指令引起的问题,我修改了代码,如下所示:
, {
double dummyy = 0;
for(n = 0; n
L [i] [k] =(A [i] [k] - dummyy)/ L [k] [k];
}
}
但它也没有解决问题。结果仍然不同。
结果的差异来自内部循环变量 n
,它在线程之间共享,因为它是在omp编译指示之外定义的。
澄清:
循环变量 n
应该在omp pragma中声明,因为它应该是线程特定的,例如 for(int n = 0; .....)
I have a nested loop: (L and A are fully defined inputs)
#pragma omp parallel for schedule(guided) shared(L,A) \
reduction(+:dummy)
for (i=k+1;i<row;i++){
for (n=0;n<k;n++){
#pragma omp atomic
dummy += L[i][n]*L[k][n];
L[i][k] = (A[i][k] - dummy)/L[k][k];
}
dummy = 0;
}
And its sequential version:
for (i=k+1;i<row;i++){
for (n=0;n<k;n++){
dummy += L[i][n]*L[k][n];
L[i][k] = (A[i][k] - dummy)/L[k][k];
}
dummy = 0;
}
They both give different results. And parallel version is much slower than the sequential version.
What may cause the problem?
Edit:
To get rid of the problems caused by the atomic directive, I modified the code as follows:
#pragma omp parallel for schedule(guided) shared(L,A) \
private(i)
for (i=k+1;i<row;i++){
double dummyy = 0;
for (n=0;n<k;n++){
dummyy += L[i][n]*L[k][n];
L[i][k] = (A[i][k] - dummyy)/L[k][k];
}
}
But it also didn't work out the problem. Results are still different.
The difference in results comes from the inner loop variable n
, which is shared between threads, since it is defined outside of the omp pragma.
Clarified:
The loop variable n
should be declared inside the omp pragma, since it should be thread-specific, for example for (int n = 0;.....)
这篇关于顺序和并行版本会产生不同的结果 - 为什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!