用于SFU的特殊CUDA双精度触发功能 [英] Special CUDA Double Precision trig functions for SFU

查看:415
本文介绍了用于SFU的特殊CUDA双精度触发功能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道我将如何使用 __ cos(x)(分别 __ sin(x)在内核代码中用CUDA。我在CUDA手册中查找有这样的设备功能,但是当我实现它编译器只是说,我不能调用设备中的主机功能。

I was wondering how I would go about using __cos(x) (and respectively __sin(x)) in the kernel code with CUDA. I looked up in the CUDA manual that there is such a device function however when I implement it the compiler just says that I cannot call a host function in the device.

但是,我发现有两个姐妹函数 cosf(x) __ cosf (x),后者在SFU上运行,总体上比原始的 cosf(x)函数快得多。当然,编译器不会抱怨 __ cosf(x)函数。

However, I found that there are two sister functions cosf(x) and __cosf(x) the latter of which runs on the SFU and is overall much faster than the original cosf(x) function. The compiler does not complain about the __cosf(x) function of course.

失踪?我错了这个trig函数?

Is there a library I'm missing? Am I mistaken about this trig function?

推荐答案

由于SFU只支持某些单精度操作,没有双精度__cos()和__sin()设备函数。有单精度__cosf()和__sinf()设备函数,以及CUDA 4.2编程手册表C-4中详述的其他函数。

As the SFU only supports certain single-precision operations, there are no double-precision __cos() and __sin() device functions. There are single-precision __cosf() and __sinf() device functions, as well as other functions detailed in table C-4 of the CUDA 4.2 Programming Manual.

你正在寻找更快的替代双精度版本的标准数学函数sin()和cos()?如果需要相同参数的正弦和余弦,则sincos()应用于显着的性能提升。如果正弦或余弦的自变量乘以π,那么您将想要使用sinpi(),cospi()或sincospi()来获得更好的性能。例如,当实现用于生成正态分布的随机数的Box-Muller算法时,sincospi()是非常有用的。此外,请查看CUDA 5.0预览以获得最佳性能(请注意,预览提供了Alpha版质量)。

I assume you are looking for faster alternatives to the double-precision versions of the standard math functions sin() and cos()? If sine and cosine of the same argument are needed, sincos() should be used for a significant performance boost. If the argument of sine or cosine is multiplied by π, you would want to use sinpi(), cospi(), or sincospi() instead, for even more performance. For example, sincospi() is very useful when implementing the Box-Muller algorithm for generating normally distributed random numbers. Also, check out the CUDA 5.0 preview for best possible performance (note that the preview provides alpha-release quality).

这篇关于用于SFU的特殊CUDA双精度触发功能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆