CUDA 5.0和7.5之间的默认CUDA加法舍入模式 [英] Default CUDA addition rounding mode between cuda 5.0 and 7.5

查看:248
本文介绍了CUDA 5.0和7.5之间的默认CUDA加法舍入模式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个cuda循环,其中一个变量cumul存储双重累加:

I have a cuda loop where a variable cumul store an accumulation in double :

double cumulative_value = (double)0;
loop(...)
{
    // ...
    double valueY = computeValueY();
    // ...
    cumulative_value += valueY
}

此代码在不同的SDK上编译并在两台计算机上运行:

This code is compiled on different SDK and run on two computers :

 M1 : TeslaM2075 CUDA 5.0
 M2 : TeslaM2075 CUDA 7.5

在第10步,结果是不同的。该加法的值(十六进制的双精度表示)是:

At step 10, results are differents. Values for this addition (double precision representation in hexadecimal) are:

   0x 41 0d d3 17 34 79 27 4d    => cumulative_value
+  0x 40 b6 60 1d 78 6f 09 b0    => valueY
-------------------------------------------------------
=    
  0x 41 0e 86 18 20 3c 9f 9b (for M1)
  0x 41 0e 86 18 20 3c 9f 9a (for M2)

舍入模式没有指定,我可以看到在ptx cuda文件(== add.f64),但

Rounding mode is not specified as I can see in the ptx cuda file ( == add.f64) but M1 seems to use round to plus Infinity and M1 an other mode.

如果我强制M2使用4个舍入模式之一(__dadd_XX())用于此指令,则cumulative_value总是与步骤10之前的M1不同。

但是如果我使用相同的舍入模式强制M1和M2,结果是相同的,但不等于修改前的M1。

If I force M2 with one of the 4 rounding modes (__dadd_XX()) for this instruction, cumulative_value is always different than M1 even before step 10.
But if I force M1 and M2 with the same rounding mode, results are the same but not equals to M1 before modification.

我的目标是在M2机器(cuda 7.5)上获得M1(cuda 5.0)结果,但我不明白运行时的默认舍入模式行为。我想知道如果没有指定,rouding模式是否在运行时是动态的。

My aim is to get M1 (cuda 5.0) results on M2 machine (cuda 7.5) but I don't understand the default rounding mode behavior at runtime. I am wondering if the rouding mode is dynamic at runtime if not specified. Do you have you an idea ?

推荐答案

在另一个ptx分析之后,在我的例子中,valueY是根据cuda的FMA指令计算的5.0,而cuda 7.5编译器使用MUL和ADD指令。 Cuda文档解释了使用单个FMA指令只有一个舍入步骤,而使用MUL和ADD有两个舍入步骤。非常感谢您帮助我:)

After another ptx analysis and in my case, valueY is computed from a FMA instruction on cuda 5.0 while cuda 7.5 compiler uses MUL and ADD instructions. Cuda documentation explains there is only one rounding step using single FMA instruction while there are two rounding steps using MUL and ADD. Thank you very much for helping me :)

这篇关于CUDA 5.0和7.5之间的默认CUDA加法舍入模式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆