GLSL顶点着色器具有早期返回和分支的性能 [英] GLSL vertex shader performance with early return and branching

查看:139
本文介绍了GLSL顶点着色器具有早期返回和分支的性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有这样的顶点着色器

void main (){

    vec4 wPos = modelMatrix * vec4( position , 1. );

    vWorldPosition = wPos.xyz;

    float mask = step(
        0.,
        dot(
            cameraDir, 
            normalize(normalMatrix * aNormal)
        )
    );

    gl_PointSize = mask * uPointSize;

    gl_Position = projectionMatrix * viewMatrix * wPos;

}

我不完全确定如何测试着色器的性能,并排除其他因素(如透支).我想象一个大小为1的点在屏幕空间的网格中没有任何重叠会起作用吗?

I'm not entirely sure how to test the performance of the shader, and exclude other factors like overdraw. I imagine a point of size 1, arranged in a grid in screen space without any overlap would work?

否则,我会对这些调整感到好奇:

Otherwise i'm curious about these tweaks:

(删除step,删除一个乘法,引入if else)

(removes step, removes a multiplication, introduces if else)

void main (){

    if(dot(
         cameraDir, 
         normalize(normalMatrix * aNormal) //remove step
    ) < 0.) {
        gl_Position = vec4(0.,.0,-2.,.1); 
        gl_PointSize = 0.;
    } else {

        gl_PointSize = uPointSize; //remove a multiplication

        vec4 wPos = modelMatrix * vec4( position , 1. );

        vWorldPosition = wPos.xyz;
        gl_Position = projectionMatrix * viewMatrix * wPos;
    }

}

与之类似:

void main (){

    if(dot(
         cameraDir, 
         normalize(normalMatrix * aNormal) 
    ) < 0.) {
        gl_Position = vec4(0.,.0,-2.,.1); 
        return;
    }

    gl_PointSize = uPointSize; 

    vec4 wPos = modelMatrix * vec4( position , 1. );

    vWorldPosition = wPos.xyz;

    gl_Position = projectionMatrix * viewMatrix * wPos;

}

这些着色器的行为会有所不同,为什么/如何?

Will these shaders behave differently and why/how?

我很想知道是否有一些可以量化性能差异的东西.

I'm interested if there is a something to quantify the difference in performance.

  • 是否存在某种价值,例如MAD的数量或其他代码显然可以产生的其他价值?
  • 不同世代的GPU是否会以不同的方式对待这些差异?
  • 如果保证步骤版本最快,是否有已知的模式列表,说明如何避免分支以及首选哪种操作? (也可以使用floor代替step吗?):

.

float condition = clamp(floor(myDot + 1.),0.,1.); //is it slower?

推荐答案

变量太多了,所以答案是取决于".一些GPU可以处理分支.有些不能这样做,并且代码会由编译器扩展,因此没有分支,只有数学乘以0而不是其他数学.然后是平铺GPU之类的事情,它们试图积极地避免过度绘制.我确定还有其他因素.

There are just way too many variables so the answer is "it depends". Some GPU can handle branches. Some can't and the code is expanded by the compiler so that there are no branches, just math that is multiplied by 0 and other math that is not. Then there's things like tiling GPUs that attempt to aggressively avoid overdraw. I'm sure there are other factors.

理论上,您可以运行着色器的一百万次或几百万次迭代,并使用

Theoretically you can run a million or a few million iterations of your shader and time it with

gl.readPixels(one pixel);
const start = performance.now();
...draw a bunch..
gl.readPixels(one pixel);
const end = performance.now();
const elapsedTime = end - start;

gl.readPixels是同步操作,因此它使GPU管道停滞了. elapsedTime本身不是实际的时间,因为它包括启动GPU和停止GPU等操作,但是看来您可以将一个着色器中的elapsedTime与另一个着色器进行比较,以查看哪个更快.

gl.readPixels is a synchronous operation so it's stalls the GPU pipeline. The elapsedTime itself is not the actual time since it includes starting up the GPU and stopping it among other things it but it seems like you could compare the elapsedTime from one shader with another to see which is faster.

换句话说,如果elapsedTime是10秒,则并不意味着您的着色器花费了10秒.这意味着启动gpu,运行着色器和停止GPU花了10秒钟.这些秒中有多少秒开始,多少秒停止以及您的着色器多少不可用.但是,如果一个着色器的elaspedTime是10秒,另一个着色器的elaspedTime是11秒,那么可以肯定地说一个着色器比另一个着色器快.请注意,您可能希望使测试时间足够长,以使您获得的差异秒数而不是微秒的差异.您还需要在多个GPU上进行测试,以查看速度差异是否始终成立.

In other words if elapsedTime is 10 seconds it does not mean your shader took ten seconds. It means it took 10 seconds to start the gpu, run your shader, and stop the GPU. How many of those seconds are start, how many are stop and how many are your shader isn't available. But, if elaspedTime for one shader is 10 seconds and 11 for another than it's probably safe to say one shader is faster than the other. Note you probably want to make your test long enough that you get seconds of difference and not microseconds of difference. You'd also need to test on multiple GPUs to see if the speed differences always hold true.

请注意,在顶点着色器中调用return不会阻止生成顶点.实际上,在这种情况下,gl_Position的含义是不确定的.

Note that calling return in the vertex shader does not prevent the vertex from being generated. In fact what gl_Position is in that case is undefined.

这篇关于GLSL顶点着色器具有早期返回和分支的性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆