GPU上的浮点数除以CPU上的浮点数 [英] Division of floating point numbers on GPU different from that on CPU

查看:181
本文介绍了GPU上的浮点数除以CPU上的浮点数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我在GPU上划分2个浮点数时,我得到.196405。当我在CPU上划分,我得到.196404。使用计算器的acutal值为.196404675。

When I divide 2 floating point numbers on the GPU, i get .196405. When i divide them on CPU , i get .196404. The acutal value using the calculator is .196404675. How do i make the division on the GPU and the CPU same?

推荐答案

如果对另一个答案提出意见,为什么期望从CPU和GPU上运行浮点计算得到相同的结果是不现实的。它比那更强大:你不能假设当相同的源代码是针对不同的目标架构(例如x86或x64)或不同的优化级别编译时,FP结果将是相同的。

As the comments to another answer suggest, there are many reasons why it is not realistic to expect the same results from floating point computations run on the CPU and GPU. It's much stronger than that: you can't assume that FP results will be the same when the same source code is compiled against a different target architecture (e.g. x86 or x64) or with different optimization levels, either.

事实上,如果你的代码是多线程的,并且FP操作是从一个run到下一个以不同的顺序执行的,那么EXACT SAME EXECUTABLE在EXACT SAME SYSTEM上运行可能会产生略微不同的结果

In fact, if your code is multithreaded and the FP operations are performed in different orders from one run to the next, then the EXACT SAME EXECUTABLE running on the EXACT SAME SYSTEM may produce slightly different results from one run to the next.

一些原因包括但不限于:

Some of the reasons include, but are not limited to:


  • 浮点运算不是关联的,因此看似良性的重新排序(例如上述多线程的竞争条件)可以改变结果;

  • 不同的架构支持不同的精度级别和不同条件下的舍入(即编译器标志,控制字与每条指令);

  • 不同的编译器不同地解释语言标准,


  • floating point operations are not associative, so seemingly-benign reorderings (such as the race conditions from multithreading mentioned above) can change results;
  • different architectures support different levels of precision and rounding under different conditions (i.e. compiler flags, control word versus per instruction);
  • different compilers interpret the language standards differently, and
  • some architectures support FMAD (fused multiply-add) and some do not.

请注意,为了讨论的目的,CUDA的JIT编译器使得PTX代码成为未来可用的GPU架构的魔力)当然应该预期会影响FP结果。

Note that for purposes of this discussion, the JIT compilers for CUDA (the magic that enables PTX code to be future-proof to GPU architectures that are not yet available) certainly should be expected to perturb FP results.

您必须编写FP代码这是稳健的,尽管前述。

You have to write FP code that is robust despite the foregoing.

在我今天写这篇文章时,我相信CUDA GPU有一个比任何现代CPU更好设计的浮点运算架构。 GPU包括用于16位浮点和FMAD的本地IEEE标准(c.2008),全速支持异步,并且能够在每个指令的基础上进行舍入控制,而不是设置对所有FP指令具有副作用的控制字

As I write this today, I believe that CUDA GPUs have a much better-designed architecture for floating point arithmetic than any contemporary CPU. GPUs include native IEEE standard (c. 2008) support for 16-bit floats and FMAD, have full-speed support for denormals, and enable rounding control on a per-instruction basis rather than control words whose settings have side effects on all FP instructions and are expensive to change.

相比之下,CPU有过多的每线程状态和性能不佳,除非使用SIMD指令,主流编译器是可怕的利用性能(因为向量化标量C代码以利用这样的指令集比构建用于诸如CUDA的伪标量架构的编译器困难得多)。如果维基百科历史页面被相信,英特尔和AMD似乎完全打算添加FMAD支持

In contrast, CPUs have an excess of per-thread state and poor performance except when using SIMD instructions, which mainstream compilers are terrible at exploiting for performance (since vectorizing scalar C code to take advantage of such instruction sets is much more difficult than building a compiler for a pseudo-scalar architecture such as CUDA). And if the wikipedia History page is to be believed, Intel and AMD appear to have completely botched the addition of FMAD support in a way that defies description.

您可以在这里找到有关NVIDIA GPU的浮点精度和IEEE支持的详细讨论:

You can find an excellent discussion of floating point precision and IEEE support in NVIDIA GPUs here:

https://developer.nvidia .com / content / precision-performance-floating-point-and-ieee-754-compliance-nvidia-gpus

这篇关于GPU上的浮点数除以CPU上的浮点数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆