OpenMP并行减少带来错误的结果 [英] OpenMP parallel for reduction delivers wrong results
问题描述
我正在与一个信号矩阵工作,我的目标是计算一个行的所有元素的总和。基质被再由下述结构psented $ P $
I am working with a signal matrix and my goal is to calculate the sum of all elements of a row. The matrix is represented by the following struct:
typedef struct matrix {
float *data;
int rows;
int cols;
int leading_dim;
} matrix;
我不得不提一下矩阵存储在列主顺序( HTTP: //en.wikipedia.org/wiki/Row-major_order#Column-major_order ),这应该解释公式列* tan_hd.rows +行
检索正确的指数。
I have to mention the matrix is stored in column-major order (http://en.wikipedia.org/wiki/Row-major_order#Column-major_order), which should explain the formula column * tan_hd.rows + row
for retrieving the correct indices.
for(int row = 0; row < tan_hd.rows; row++) {
float sum = 0.0;
#pragma omp parallel for reduction(+:sum)
for(int column = 0; column < tan_hd.cols; column++) {
sum += tan_hd.data[column * tan_hd.rows + row];
}
printf("row %d: %f", row, sum);
}
如果没有OpenMP的编译,交付的结果是正确的,看起来像这样:
Without the OpenMP pragma, the delivered result is correct and looks like this:
row 0: 8172539.500000 row 1: 8194582.000000
当我添加的#pragma OMP ...
如上所述,将返回不同的(错误的)结果是:
As soon as I add the #pragma omp...
as described above, a different (wrong) result is returned:
row 0: 8085544.000000 row 1: 8107186.000000
在我的理解,还原(+:和)
创建总和传抄
为每个线程,并经过在完成这些循环部分结果归纳起来,再次之
写回到全局变量。这是什么,那我做错了?
In my understanding, reduction(+:sum)
creates private copies of sum
for each thread, and after completing the loop these partial results are summed up and written back to the global variable sum
again. What is it, that I am doing wrong?
我AP preciate您的建议!
I appreciate your suggestions!
推荐答案
使用 Kahan的求和算法
- 它具有相同的算法复杂度为一个天真的总和
- 这将大大增加求和的准确度,而无需切换的数据类型将翻一番。
通过重写你的code实现它:
By rewriting your code to implement it:
for(int row = 0; row < tan_hd.rows; row++) {
float sum = 0.0, c = 0.0;
#pragma omp parallel for reduction(+:sum, +:c)
for(int column = 0; column < tan_hd.cols; column++) {
float y = tan_hd.data[column * tan_hd.rows + row] - c;
float t = sum + y;
c = (t - sum) - y;
sum = t;
}
sum = sum - c;
printf("row %d: %f", row, sum);
}
您可以将所有浮动
此外切换到双击
来达到更高的precision,但由于你的数组是一个浮动
阵,应该只有在年底signficant号码的数量差异。
You can additionally switch all float
to double
to achieve a higher precision, but since your array is a float
array, there should only be differences in the number of signficant numbers at the end.
这篇关于OpenMP并行减少带来错误的结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!