CUDA浮点precision [英] cuda float point precision

查看:165
本文介绍了CUDA浮点precision的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

可以在此有人评论,

我想做一个向量的点积。我的浮向量[2080:2131]和[2112:2163],他们中的每一个含有52元素。

I want to do a vector dot product. My float vector are [2080:2131] and [2112:2163], each one of them contains 52 elements.

a[52] = {2080 2081 2082 ... ... 2129 2130 2131};
b[52] = {2112 2113 2114 ... ... 2161 2162 2163};

for (int i = 0; i < 52; i++)
{
    sum += a[i]*b[i];
}

对于整个​​长度(52元),结果金额为234038032我的内核,而MATLAB给234038038.有关产品的1到9元和,我的内核结果和MATLAB结果一​​致。对于10元总和,它是关闭的1和逐渐增加。结果是可重复的。我查了所有的元素,没有发现问题。

The result sum for whole length (52 element)was 234038032 by my kernel while matlab gave 234038038. For 1 to 9 element sum of product, my kernel result agrees with matlab result. For 10 element sum, it is off by 1 and gradually increases. The results were reproducible. I checked all the elements and found no problem.

推荐答案

由于矢量是漂浮遇到的舍入误差。 Matlab的将存储以更高的precision(双)的一切,所以不会看到舍入误差这么早。

Since the vectors are float you are experiencing rounding errors. Matlab will store everything with much higher precision (double) and hence won't see the rounding errors so early.

您可能想看看<一个href=\"http://www.google.com/url?sa=t&source=web&cd=1&sqi=2&ved=0CBMQFjAA&url=http%3A%2F%2Fdocs.sun.com%2Fsource%2F806-3568%2Fncg_goldberg.html&rct=j&q=what%20every%20computer%20scientist%20should%20know%20about%20floating%20point&ei=W8DTTNTgFMmQswbrmvzhBA&usg=AFQjCNF921dbnxubC7P17UB7m_w44HAPEw&cad=rja\">What每台计算机科学家应该知道关于大卫·戈德堡浮点 - 无价的阅读

You may want to check out What Every Computer Scientist Should Know About Floating Point by David Goldberg - invaluable reading.

用C简单的演示++(即无关CUDA):

Simple demo in C++ (i.e. nothing to do with CUDA):

#include <iostream>

int main(void)
{
  float a[52];
  float b[52];
  double c[52];
  double d[52];

  for (int i = 0 ; i < 52 ; i++)
  {
    a[i] = (float)(2080 + i);
    b[i] = (float)(2112 + i);
    c[i] = (double)(2080 + i);
    d[i] = (double)(2112 + i);
  }

  float fsum = 0.0f;
  double dsum = 0.0;
  for (int i = 0 ; i < 52 ; i++)
  {
    fsum += a[i]*b[i];
    dsum += c[i]*d[i];
  }

  std::cout.precision(20);
  std::cout << fsum << " " << dsum << std::endl;
}

运行此,你会得到:

Run this and you get:

234038032 234038038

那么,你可以做这件事?有几个方向,你可以去...

So what can you do about this? There are several directions you could go in...


  • 使用更高的precision:这会影响性能,而不是所有设备都支持双precision。它也只是推迟了问题,而不是修复它,所以我不会推荐它!

  • 请基于树还原:你可以在COMBIN和vectorAdd减少SDK样本中的技术

  • 使用推力:很直接

  • Use higher precision: this will affect performance and not all devices support double precision. It also just postpones the problem rather than fixing it, so I would not recommend it!
  • Do a tree based reduction: you could combin the techniques in the vectorAdd and reduction SDK samples.
  • Use Thrust: very straight-forward.

这篇关于CUDA浮点precision的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆