为什么OpenMP无法对这些数字求和? [英] Why does OpenMP fail to sum these numbers?

查看:71
本文介绍了为什么OpenMP无法对这些数字求和?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

请考虑以下最小C代码示例.使用export OMP_NUM_THREADS=4 && gcc -fopenmp minimal2.c && ./a.out进行编译和执行时(OS X 10.11上的自制GCC 5.2.0),这通常会产生正确的行为,即具有相同编号的七行.但有时会发生这种情况:

Consider the following minimal C code example. When compiling and executing with export OMP_NUM_THREADS=4 && gcc -fopenmp minimal2.c && ./a.out (homebrew GCC 5.2.0 on OS X 10.11), this usually produces the correct behavior, i.e. seven lines with the same number. But sometimes, this happens:

[ ] bsum=1.893293142303100e+03
[1] asum=1.893293142303100e+03
[2] asum=1.893293142303100e+03
[0] asum=1.893293142303100e+03
[3] asum=3.786586284606200e+03
[ ] bsum=1.893293142303100e+03
[ ] asum=3.786586284606200e+03
equal: 0

看起来像是竞争条件,但是我的代码对我来说似乎还不错.我在做什么错了?

It looks like a race condition, but my code seems fine to me. What am I doing wrong?

#include <stdio.h>
#include <stdlib.h>
#include <time.h>

#ifdef _OPENMP
#include <omp.h>
#define ID omp_get_thread_num()
#else
#define ID 0
#endif
#define N 1400

double a[N];

double verify() {
    int i;
    double bsum = 0.0;
    for (i = 0; i < N; i++) {
        bsum += a[i] * a[i];
    }
    fprintf(stderr, "[ ] bsum=%.15e\n", bsum);
    return bsum;
}

int main(int argc, char *argv[]) {
    int i;
    double asum = 0.0, bsum;
    srand((unsigned int)time(NULL));
    //srand(1445167001); // fails on my machine
    for (i = 0; i < N; i++) {
        a[i] = 2 * (double)rand()/(double)RAND_MAX;
    }
    bsum = verify();
    #pragma omp parallel shared(asum)
    {
        #pragma omp for reduction(+: asum)
        for (i = 0; i < N; i++) {
            asum += a[i] * a[i];
        }
        fprintf(stderr, "[%d] asum=%.15e\n", ID, asum);
    }
    bsum = verify();
    fprintf(stderr, "[ ] asum=%.15e\n", asum);
    return 0;
}

编辑:Gilles引起我注意,因为我高估了双精度,所以从第15个有效数字开始的错误是正常的.在Debian机器上,我也无法用2倍正确的数字重现错误的行为,因此这可能与自制gcc或Mac相关.

Gilles brought to my attention that the errors beginning at the 15th significant digit are normal as I overestimated the precision of a double. I also cannot reproduce the faulty behavior with 2x the correct number on the Debian machine, so this might be homebrew gcc or Mac related.

我有一个类似问题的问题,此处,但两者似乎并不相关(至少在我看来),因此我将其作为一个单独的问题开始.

I had a problem with a similar issue here, but the two do not seem to be related (at least in my eyes), so I started this as a separate question.

推荐答案

我强烈怀疑这是因为

I strongly suspect that this is because floating-point addition is not associative. As a result, OpenMP sums the multiplications in different orders, yielding slightly different results.

OpenMP 4.0规范,第1.3版执行模型说:

例如,串行加法减少可具有与并行减法不同的加法关联模式.这些不同的关联可能会更改浮点加法的结果.

For example, a serial addition reduction may have a different pattern of addition associations than a parallel reduction. These different associations may change the results of floating-point addition.

请参见用于缩减的OpenMP并行会产生错误的结果,以获取建议解决方案.

See OpenMP parallel for reduction delivers wrong results for a suggested solution.

这篇关于为什么OpenMP无法对这些数字求和?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆