OpenMP并行减少带来错误的结果 [英] OpenMP parallel for reduction delivers wrong results

查看:185
本文介绍了OpenMP并行减少带来错误的结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在与一个信号矩阵工作,我的目标是计算一个行的所有元素的总和。基质被再由下述结构psented $ P $

I am working with a signal matrix and my goal is to calculate the sum of all elements of a row. The matrix is represented by the following struct:

typedef struct matrix {
  float *data;
  int rows;
  int cols;
  int leading_dim;
} matrix;

我不得不提一下矩阵存储在列主顺序( HTTP: //en.wikipedia.org/wiki/Row-major_order#Column-major_order ),这应该解释公式列* tan_hd.rows +行检索正确的指数。

I have to mention the matrix is stored in column-major order (http://en.wikipedia.org/wiki/Row-major_order#Column-major_order), which should explain the formula column * tan_hd.rows + row for retrieving the correct indices.

for(int row = 0; row < tan_hd.rows; row++) {
    float sum = 0.0;
    #pragma omp parallel for reduction(+:sum)
    for(int column = 0; column < tan_hd.cols; column++) {
        sum += tan_hd.data[column * tan_hd.rows + row];
    }
    printf("row %d: %f", row, sum);
}

如果没有OpenMP的编译,交付的结果是正确的,看起来像这样:

Without the OpenMP pragma, the delivered result is correct and looks like this:

row 0: 8172539.500000 row 1: 8194582.000000 

当我添加的#pragma OMP ... 如上所述,将返回不同的(错误的)结果是:

As soon as I add the #pragma omp... as described above, a different (wrong) result is returned:

row 0: 8085544.000000 row 1: 8107186.000000

在我的理解,还原(+:和)创建总和传抄为每个线程,并经过在完成这些循环部分结果归纳起来,再次写回到全局变量。这是什么,那我做错了?

In my understanding, reduction(+:sum) creates private copies of sum for each thread, and after completing the loop these partial results are summed up and written back to the global variable sum again. What is it, that I am doing wrong?

我AP preciate您的建议!

I appreciate your suggestions!

推荐答案

使用 Kahan的求和算法


  • 它具有相同的算法复杂度为一个天真的总和

  • 这将大大增加求和的准确度,而无需切换的数据类型将翻一番。

通过重写你的code实现它:

By rewriting your code to implement it:

for(int row = 0; row < tan_hd.rows; row++) {
    float sum = 0.0, c = 0.0;
    #pragma omp parallel for reduction(+:sum, +:c)
    for(int column = 0; column < tan_hd.cols; column++) {
        float y = tan_hd.data[column * tan_hd.rows + row] - c;
        float t = sum + y;
        c = (t - sum) - y;
        sum = t;
    }
    sum = sum - c;
    printf("row %d: %f", row, sum);
}

您可以将所有浮动此外切换到双击来达到更高的precision,但由于你的数组是一个浮动阵,应该只有在年底signficant号码的数量差异。

You can additionally switch all float to double to achieve a higher precision, but since your array is a float array, there should only be differences in the number of signficant numbers at the end.

这篇关于OpenMP并行减少带来错误的结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆