不同CG/GLSL/HLSL功能的表现 [英] Performance of different CG/GLSL/HLSL functions

查看:27
本文介绍了不同CG/GLSL/HLSL功能的表现的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有标准的着色器函数库,例如 Cg.但是是否有资源可以告诉您每个操作需要多长时间...我的想法类似于您过去能够查找每个 ASM 操作需要多少个周期.

There are standard libraries of shader functions, such as for Cg. But are there resources which tell you how long each takes... I'm thinking similar to how you used to be able to look up how many cycles each ASM op would take.

推荐答案

没有可靠资源可以告诉您各种标准着色器函数需要多长时间.甚至不是特定的硬件.

There are no reliable resources that will tell you how long various standard shader functions take. Not even for a particular piece of hardware.

这样做的原因与指令调度和现代着色器架构的工作方式有关.以一个简单的 sin 函数为例.假设硬件有一个特殊的硬件来计算一个值的正弦值,所以它不是手动使用 Tailor 系列之类的.然而,我们也假设它需要 4 个操作码的序列来实际计算它.因此,sin 需要4 个周期".

The reason for this has to do with instruction scheduling and the way modern shader architectures work. Take a simple sin function. Let's say that the hardware has a special hardware to compute the sine of a value, so it's not manually using a Tailor series or something. However, let's also say that it takes a sequence of 4 opcodes to actually compute it. Therefore, sin would take "4 cycles".

然而,所有这些操作码都是标量操作.因此,当它们进行时,您实际上可以有一些 3 向量点积,或者在某些硬件的情况下,4 向量点积同时进行,在同一个处理器上.因此,如果硬件具有带有标量运算的 4 向量点积,则执行 sin 和矩阵向量乘法所需的周期数仍然是 4.

However, all of those opcodes are scalar operations. Therefore, while they're going on, you could in fact have some 3-vector dot-products, or in the case of some hardware, 4-vector dot-products going on at the same time, on the same processor. Therefore, if the hardware has 4-vector dot-products with scalar operations, the number of cycles it takes to execute a sin and a matrix-vector multiply is... still 4.

那么sin的操作成本是多少?如果你去掉矩阵乘法,没有什么会变得更快.如果你去掉sin,没有什么会变得更快.它要多少钱?你不能说,因为单次操作的成本无关紧要;唯一可测量的数量是着色器本身的成本.

So how much did the sin operation cost? If you take out the matrix multiply, nothing gets faster. If you take out the sin, nothing still gets faster. How much does it cost? You can't say, because the cost of a single operation is irrelevant; the only measurable quantity is the cost of the shader itself.

最终,您所能做的就是尝试合理构建着色器并查看性能.除非您有低级调试工具来对底层着色器程序集进行反处理(不,DX 程序集还不够好),否则这确实是您能做的最好的事情.

Ultimately, all you can do is try to build your shader reasonably and see what the performance is. Unless you have low-level debugging tools to deprocess the underlying shader assembly (and no, DX assembly isn't good enough), that's really the best you can do.

这篇关于不同CG/GLSL/HLSL功能的表现的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆