在CUDA中使用sincos()的最佳方法 [英] Best way to approach using sincos() in CUDA

查看:156
本文介绍了在CUDA中使用sincos()的最佳方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我不清楚实现sincos()的最佳方法是什么.我到处都看过,但似乎共识是,这比分别计算sin和cos更好.以下本质上是我在内核中使用sincos所拥有的东西.但是,当我将其记为仅分开做罪和cos时,它的输出速度会变慢.我认为这与我使用cPtr和sPtr的方式有关.有没有更好的办法?

I am not clear on what should be the best way to implement sincos(). I've looked up everywhere but it seems the consensus is simply that it is better than doing separate computation of sin and cos. Below is essentially what I have in my kernel for using sincos. However, when I clock it against just doing sin and cos separately it comes out slower. I think it has to do with how I'm using my cPtr and sPtr. Is there a better way?

int idx = blockIdx.x * blockDim.x + threadIdx.x;

if (idx < dataSize)
{
    idx += lower;
    double f = ((double) idx) * deltaF;
    double cosValue;
    double sinValue;
    double *sPtr = &sinValue;
    double *cPtr = &cosValue;
    sincos(twopit * f, sPtr, cPtr);

    d_re[idx - lower] = cosValue;
    d_im[idx - lower] = - sinValue;

    //d_re[idx - lower] = cos(twopit * f);
    //d_im[idx - lower] = - sin(twopit * f);
}

推荐答案

指针是多余的-您可以摆脱它们,例如

The pointers are redundant - you can get rid of them, e.g.

double cosValue;
double sinValue;
sincos(twopit * f, &sinValue, &cosValue);

但是我不确定这是否会对性能产生很大影响(不过值得一试).

but I'm not sure this will have much effect on performance (worth a try though).

还考虑在精度要求允许的情况下使用浮点数而不是双精度数,并使用相应的单精度函数(在这种情况下为 sincosf ).

Also consider using float rather than double where precision requirements permit, and use the corresponding single precision functions (sincosf in this case).

这篇关于在CUDA中使用sincos()的最佳方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆