条件语句会减慢着色器的速度吗? [英] Do conditional statements slow down shaders?

查看:141
本文介绍了条件语句会减慢着色器的速度吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道着色器(顶点/片段/像素...)内部的"if语句"是否真的降低了着色器的性能.例如:

I want to know if "if-statements" inside shaders (vertex / fragment / pixel...) are really slowing down the shader performance. For example:

使用它更好吗?

vec3 output;
output = input*enable + input2*(1-enable);

代替此:

vec3 output;
if(enable == 1)
{
    output = input;
}
else
{
    output = input2;
}

在另一个论坛上有一个有关该话题的讨论(2013年): http://answers.unity3d.com/questions/442688/shader-if-else-performance.html 家伙在这里说的是,If语句确实对着色器的性能不利.

in another forum there was a talk about that (2013): http://answers.unity3d.com/questions/442688/shader-if-else-performance.html Here the guys are saying, that the If-statements are really bad for the performance of the shader.

在这里,他们还讨论了if/else语句(2012)中的内容: https://www.opengl.org /discussion_boards/showthread.php/177762-Performance-alternative-for-if-(-)

Also here they are talking about how much is inside the if/else statements (2012): https://www.opengl.org/discussion_boards/showthread.php/177762-Performance-alternative-for-if-(-)

也许硬件或着色器编译器现在更好了,并且它们以某种方式解决了这个(也许不存在)性能问题.

maybe the hardware or the shadercompiler are better now and they fix somehow this (maybe not existing) performance issue.

这种情况是什么,这里可以说enable是一个统一变量,并且始终设置为0:

What is with this case, here lets say enable is a uniform variable and it is always set to 0:

if(enable == 1) //never happens
{
    output = vec4(0,0,0,0);
}
else  //always happens
{
    output = calcPhong(normal, lightDir);
}

我认为这里在着色器内部有一个分支,可降低着色器的速度.正确吗?

I think here we have a branch inside the shader which slows the shader down. Is that correct?

制作2个不同的着色器是否更有意义,例如一个用于其他着色器,另一个用于if部分?

Does it make more sense to make 2 different shaders like one for the else and the other for the if part?

推荐答案

与着色器有什么关系,甚至可能导致if语句性能问题?它与着色器的执行方式以及GPU从何处获得其大量计算性能有关.

What is it about shaders that even potentially makes if statements performance problems? It has to do with how shaders get executed and where GPUs get their massive computing performance from.

单独的着色器调用通常并行执行,同时执行相同的指令.他们只是在不同的输入值集上执行它们.他们共用制服,但内部寄存器不同.一组全部执行相同操作序列的着色器的一个术语是波前".

Separate shader invocations are usually executed in parallel, executing the same instructions at the same time. They're simply executing them on different sets of input values; they share uniforms, but they have different internal registers. One term for a group of shaders all executing the same sequence of operations is "wavefront".

任何形式的条件分支的潜在问题是,它可能将所有问题搞砸了.它导致波前内的不同调用必须执行不同的代码序列.这是一个非常昂贵的过程,其中必须创建一个新的波前,将数据复制到该波前,等等.

The potential problem with any form of conditional branching is that it can screw all that up. It causes different invocations within the wavefront to have to execute different sequences of code. That is a very expensive process, whereby a new wavefront has to be created, data copied over to it, etc.

除非...不是.

例如,如果条件是波前每次调用所采用的条件,则不需要运行时差异.因此,if的成本只是检查条件的成本.

For example, if the condition is one that is taken by every invocation in the wavefront, then no runtime divergence is needed. As such, the cost of the if is just the cost of checking a condition.

因此,假设您有一个条件分支,并假设波前中的所有调用都采用相同的分支.在这种情况下,表达式的性质存在三种可能性:

So, let's say you have a conditional branch, and let's assume that all of the invocations in the wavefront will take the same branch. There are three possibilities for the nature of the expression in that condition:

  • 编译时静态.条件表达式完全基于编译时常量.这样的话,您可以通过查看代码知道并知道将采用哪个分支.几乎所有编译器都将其作为基本优化的一部分来处理.
  • 静态均匀分支.该条件基于涉及在编译时已知为常量(特别是常量和uniform值)的表达式.但是表达式的 value 在编译时是未知的.因此,编译器可以静态确定波前不会被此if破坏,但是编译器无法知道将采用哪个分支.
  • 动态分支.条件表达式包含常数和统一数以外的术语.在这里,编译器无法先验地告知波阵面是否会破裂.是否需要发生取决于条件表达式的运行时评估.
  • Compile-time static. The conditional expression is entirely based off of compile-time constants. sa such, you know from looking at the code and know which branches will be taken. Pretty much any compiler handles this as part of basic optimization.
  • Statically uniform branching. The condition is based off of expressions involving things which are known at compile-time to be constant (specifically, constants and uniform values). But the value of the expression will not be known at compile-time. So the compiler can statically be certain that wavefronts will never be broken by this if, but the compiler cannot know which branch will be taken.
  • Dynamic branching. The conditional expression contains terms other than constants and uniforms. Here, a compiler cannot tell a priori if a wavefront will be broken up or not. Whether that will need to happen depends on the runtime evaluation of the condition expression.

不同的硬件可以处理不同的分支类型而不会产生差异.

Different hardware can handle different branching types without divergence.

此外,即使条件被不同的波前所接受,编译器也可以重组代码以不需要实际的分支.您给出了一个很好的示例:output = input*enable + input2*(1-enable);在功能上等效于if语句.编译器可以检测到正在使用if来设置变量,因此可以同时执行两侧.经常在分支条件较小的动态条件下执行此操作.

Also, even if a condition is taken by different wavefronts, the compiler could restructure the code to not require actual branching. You gave a fine example: output = input*enable + input2*(1-enable); is functionally equivalent to the if statement. A compiler could detect that an if is being used to set a variable, and thus execute both sides. This is frequently done for cases of dynamic conditions where the bodies of the branches are small.

几乎所有硬件都可以处理var = bool ? val1 : val2而不必发散.这可能是在2002年.

Pretty much all hardware can handle var = bool ? val1 : val2 without having to diverge. This was possible way back in 2002.

由于这是非常依赖于硬件的,因此...取决于硬件.但是,可以观察到某些硬件时代:

Since this is very hardware-dependent, it... depends on the hardware. There are however certain epochs of hardware that can be looked at:

那里,有点荒凉. NVIDIA针对此类硬件的编译器臭名昭著,因为它可以检测到这种情况,并在您更改影响这种情况的制服时实际上重新编译着色器.

There, it's kinda the wild west. NVIDIA's compiler for such hardware was notorious for detecting such conditions and actually recompiling your shader whenever you changed uniforms that affected such conditions.

通常,这个时代是大约80%的从不使用if语句"的来源.但是即使在这里,也不一定是真的.

In general, this era is where about 80% of the "never use if statements" comes from. But even here, it's not necessarily true.

您可以期望静态分支的优化.您可以希望静态统一分支不会引起任何其他问题(尽管NVIDIA认为重新编译比执行速度更快的事实使得至少在他们的硬件上不太可能实现).但是,即使所有调用都采用同一分支,动态分支也会使您付出一些代价.

You can expect optimization of static branching. You can hope that statically uniform branching won't cause any additional slowdown (though the fact that NVIDIA thought recompilation would be faster than executing it makes it unlikely at least for their hardware). But dynamic branching is going to cost you something, even if all of the invocations take the same branch.

这个时代的编译器会尽最大努力优化着色器,以便可以简单地执行简单条件.例如,您的output = input*enable + input2*(1-enable);是体面的编译器可以从等效的if语句中生成的东西.

Compilers of this era do their best to optimize shaders so that simple conditions can be executed simply. For example, your output = input*enable + input2*(1-enable); is something that a decent compiler could generate from your equivalent if statement.

这个时代的硬件通常能够处理静态统一的分支语句,而几乎不会降低速度.对于动态分支,您可能会遇到减速,也可能不会遇到减速.

Hardware of this era is generally capable of handling statically uniform branches statements with little slowdown. For dynamic branching, you may or may not encounter slowdown.

几乎可以肯定,这个时代的硬件可以处理动态统一条件很少性能问题.实际上,它甚至不必是动态统一的.只要同一波前的所有调用都采用相同的路径,您就不会看到任何明显的性能损失.

Hardware of this era is pretty much guaranteed to be able to handle dynamically uniform conditions with little performance issues. Indeed, it doesn't even have to be dynamically uniform; so long as all of the invocations within the same wavefront take the same path, you won't see any significant performance loss.

请注意,上一个时期的某些硬件也可能会这样做.但这几乎可以肯定是真的.

Note that some hardware from the previous epoch probably could do this as well. But this is the one where it's almost certain to be true.

欢迎回到荒野的西部.尽管与Pre-D3D10台式机不同,这主要是由于ES 2.0口径硬件的巨大差异.有很多可以处理ES 2.0的东西,它们彼此之间的工作方式却大不相同.

Welcome back to the wild west. Though unlike Pre-D3D10 desktop, this is mainly due to the huge variance of ES 2.0-caliber hardware. There's such a huge amount of stuff that can handle ES 2.0, and they all work very differently from each other.

静态分支可能会被优化.但是,是否从静态统一分支中获得良好的性能取决于硬件.

Static branching will likely be optimized. But whether you get good performance from statically uniform branching is very hardware-dependent.

这里的硬件比ES 2.0更成熟,功能更强大.这样,您可以期望静态统一分支能够合理地执行.而且某些硬件可能可以像现代台式机硬件一样处理动态分支.

Hardware here is rather more mature and capable than ES 2.0. As such, you can expect statically uniform branches to execute reasonably well. And some hardware can probably handle dynamic branches the way modern desktop hardware does.

这篇关于条件语句会减慢着色器的速度吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆