在OpenMP中奇怪的浮动行为 [英] Strange float behaviour in OpenMP

查看:101
本文介绍了在OpenMP中奇怪的浮动行为的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我运行下面的OpenMP code

I am running the following OpenMP code

        #pragma omp parallel shared(S2,nthreads,chunk) private(a,b,tid)
    {
        tid = omp_get_thread_num();
        if (tid == 0)
        {
            nthreads = omp_get_num_threads();
            printf("\nNumber of threads = %d\n", nthreads);
        }
        #pragma omp for schedule(dynamic,chunk) reduction(+:S2)
        for(a=0;a<NREC;a++){
            for(b=0;b<NLIG;b++){
                S2=S2+cos(1+sin(atan(sin(sqrt(a*2+b*5)+cos(a)+sqrt(b)))));
            }
        } // end for a
    } /* end of parallel section */

和为NREC = NLIG = 1024和更高的价值,在8芯板,我起床7加速。的问题是,如果我比较变量S2的最终结果,可以在1%至5%的不同在串行版本获得的精确结果。可能是什么原因?我应该使用一些特定的编译选项来避免这种奇怪的浮动行为?

And for NREC=NLIG=1024 and higher values, in a 8 core board, I get up to 7 speedup. The problem is that if I compare the final results for variable S2, it differs between 1 to 5% to the exact results obtained in the serial version. What could be the reason? Should I use some specific compilation options to avoid this strange float behaviour ?

推荐答案

浮点数的加法/减法的顺序可以影响精度。

The order of additions/subtractions of floating-point numbers can affect the accuracy.

要举一个简单的例子,让我们说你的机器会将小数点后2位,而且你计算的价值1 + 0.04 + 0.04。

To take a simple example, let's say that your machine stores 2 decimal digits, and that you're computing the value of 1 + 0.04 + 0.04.


  • 如果你先做左侧此外,你得到1.04,这是四舍五入为1。第二个除了会再次给予1,所以最后的结果是1。

  • If you do the left addition first, you get 1.04, which is rounded to 1. The second addition will give 1 again, so the final result is 1.

如果你第一次做正确此外,你会得到0.08。加1,这给了1.08四舍五入为1.1。

If you do the right addition first, you get 0.08. Added to 1, this gives 1.08 which is rounded to 1.1.

有关最大精度,最好从小添加值大。

For maximum accuracy, it's best to add values from small to large.

另外一个原因可能是在CPU上花车寄存器可以包含比在主内存花车多个位。因此,如果一些中间结果在寄存器缓存,它是更准确的,但如果它被换出到存储器它被截断。

Another cause could be that float registers on the CPU may contain more bits than floats in main memory. Hence, if some intermediate result is cached in a register, it is more accurate, but if it gets swapped out to memory it gets truncated.

另请参阅在C这个问题++ FAQ

这篇关于在OpenMP中奇怪的浮动行为的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆