这种窦道逼近会比着色器CG窦道函数更快吗? [英] will this sinus approximation be faster than a shader CG sinus function?

查看:98
本文介绍了这种窦道逼近会比着色器CG窦道函数更快吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些不是真正的函数,但是它们比常规处理要快得多,它们是简单的抛物线函数.

I have some functions that are not really sines but they are a lot quicker than conventional processing, they are simple parabole functions.

在图形处理器上,这会比内置图形窦功能更快吗?

Will this be faster on a graphics processor than the built-in graphics sinus function:

    float  par (float xx){////// sinus approximation
        half xd =((fmod(abs(xx), 2.4)) - 1.2);
        if ( fmod (abs(xx) , 4.8)  > 2.4) { xd=(-xd*xd)+2.88;}
        else {xd = xd*xd;}
        xd = -xd*0.694444444+1;
        if (  (xx<0) ) { xd=-xd;}
        return xd;
    }

推荐答案

主要答案

绝对不可能比任何图形卡上的内置sin/cos函数更快.

There is absolutely no way your function will be faster than the built in sin/cos functions on any graphics cards.

着色器指令sin,cos& tan是几乎每张生产的图形卡上的单周期说明.您肯定不能在不是单周期的今天购买图形卡.

The shader instructions sin ,cos & tan are single-cycle instructions on just about EVERY graphics card ever manufactured. You certainly cannot purchase a graphics card today where it isn't a single-cycle.

从角度看待您的问题-在图形卡上,要获得正弦曲线(正弦函数)需要花费两个时间(多个mul指令)-一个GPU周期.

To put your question in perspective - on a graphics card, it takes the same time to multiple 2 numbers (mul instruction) as it does to get the sinus (sin function) - a single GPU cycle.

编写着色器时,请查看编译器的命令行选项.将提供输出生成的汇编代码的选项,并且大多数编译器甚至提供了最短路径(指令和周期数)和最长路径的总数.这些总数不能保证持续时间,因为诸如fetch之类的事情可能会使管道停滞不前,但是它们可以回答您现在要问的问题类型.

When writing your shaders have a look at the command line options for your compiler. There will be options to output the assembly code generated, and most compilers even provide totals for the shortest path (number of instructions and cycles) and the longest path. These totals are not guaranteed durations because things like fetch can stall a pipeline, but they answer the type of question you are now asking.

着色器指令的确因卡而异,但我认为最长的单个指令是4个GPU周期.

Shader instruction do vary from card to card, but I think the longest single instruction is 4 GPU cycles.

如果您查看函数的着色器编译器程序集输出,则您将调用大量指令,使用大量循环,然后询问它是否可以比单个循环指令更快地执行.

If you took a look at the shader compiler assembly output for your function you are calling lots of instructions, using lots of cycles, and then asking if it could be executed more quickly than a single cycle instruction.

图形芯片的全部目的是它们在运行指令集时非常快且非常并行(尽管这些指令可能在其他处理器上也很复杂).在对着色器进行编程时,将您的代码集中在处理器的设计目标上.着色器编程与您在软件开发中其他地方进行编程的思维定式不同,但是一旦您开始考虑计算周期并最大程度地减少获取停顿,您将很快开始利用着色器处理的真正威力.

The whole purpose of Graphics Chips is that they are very fast and very parallel at running their instruction sets (however complex those instructions may be on other processors). When programming shaders focus your code on what the processor is designed to do. Shader programming is a different mind set from the programming you do elsewhere in software development, but once you start thinking about counting cycles, and minimizing fetch stalls, you'll soon start to open the true power of shader processing.

好运.

这篇关于这种窦道逼近会比着色器CG窦道函数更快吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆