GLSL性能-函数返回值/类型 [英] GLSL performance - function return value/type

查看:201
本文介绍了GLSL性能-函数返回值/类型的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用三次过滤来平滑我的高度图,我在GLSL中实现了它:

I'm using bicubic filtering to smoothen my heightmap, I implemented it in GLSL:

三次三次插值:(请参见下面的interpolate()函数)

Bicubic interpolation: (see interpolate() function bellow)

float interpolateBicubic(sampler2D tex, vec2 t) 
{

vec2 offBot =   vec2(0,-1);
vec2 offTop =   vec2(0,1);
vec2 offRight = vec2(1,0);
vec2 offLeft =  vec2(-1,0);

vec2 f = fract(t.xy * 1025);

vec2 bot0 = (floor(t.xy * 1025)+offBot+offLeft)/1025;
vec2 bot1 = (floor(t.xy * 1025)+offBot)/1025;
vec2 bot2 = (floor(t.xy * 1025)+offBot+offRight)/1025;
vec2 bot3 = (floor(t.xy * 1025)+offBot+2*offRight)/1025;

vec2 mbot0 = (floor(t.xy * 1025)+offLeft)/1025;
vec2 mbot1 = (floor(t.xy * 1025))/1025;
vec2 mbot2 = (floor(t.xy * 1025)+offRight)/1025;
vec2 mbot3 = (floor(t.xy * 1025)+2*offRight)/1025;

vec2 mtop0 = (floor(t.xy * 1025)+offTop+offLeft)/1025;
vec2 mtop1 = (floor(t.xy * 1025)+offTop)/1025;
vec2 mtop2 = (floor(t.xy * 1025)+offTop+offRight)/1025;
vec2 mtop3 = (floor(t.xy * 1025)+offTop+2*offRight)/1025;

vec2 top0 = (floor(t.xy * 1025)+2*offTop+offLeft)/1025;
vec2 top1 = (floor(t.xy * 1025)+2*offTop)/1025;
vec2 top2 = (floor(t.xy * 1025)+2*offTop+offRight)/1025;
vec2 top3 = (floor(t.xy * 1025)+2*offTop+2*offRight)/1025;

float h[16];

h[0] = texture(tex,bot0).r;
h[1] = texture(tex,bot1).r;
h[2] = texture(tex,bot2).r;
h[3] = texture(tex,bot3).r;

h[4] = texture(tex,mbot0).r;
h[5] = texture(tex,mbot1).r;
h[6] = texture(tex,mbot2).r;
h[7] = texture(tex,mbot3).r;

h[8] = texture(tex,mtop0).r;
h[9] = texture(tex,mtop1).r;
h[10] = texture(tex,mtop2).r;
h[11] = texture(tex,mtop3).r;

h[12] = texture(tex,top0).r;
h[13] = texture(tex,top1).r;
h[14] = texture(tex,top2).r;
h[15] = texture(tex,top3).r;

float H_ix[4];

H_ix[0] = interpolate(f.x,h[0],h[1],h[2],h[3]);
H_ix[1] = interpolate(f.x,h[4],h[5],h[6],h[7]);
H_ix[2] = interpolate(f.x,h[8],h[9],h[10],h[11]);
H_ix[3] = interpolate(f.x,h[12],h[13],h[14],h[15]);

float H_iy = interpolate(f.y,H_ix[0],H_ix[1],H_ix[2],H_ix[3]);

return H_iy;
}

这是我的版本,纹理大小(1025)仍然是硬编码的.在顶点着色器和/或曲面细分评估着色器中使用它会严重影响性能(20-30fps).但是,当我将此函数的最后一行更改为:

This is my version of it, the texture size(1025) is still hardcoded. Using this in vertex shader and/or in tessellation evaluation shader, it affects performance very badly (20-30fps). But when I change the last line of this function to:

return 0;

性能会提高,就像我使用双线性或最近/不使用滤波一样.

the performance increases just like if I used bilinear or nearest/without filtering.

同样的情况发生在:(我的意思是性能仍然很好)

The same happens with: (I mean the performance remains good)

return h[...]; //...
return f.x; //...
return H_ix[...]; //...

插值功能:

float interpolate(float x, float v0, float v1, float v2,float v3)
{
    double c1,c2,c3,c4; //changed to float, see EDITs

    c1 = spline_matrix[0][1]*v1;
    c2 = spline_matrix[1][0]*v0 + spline_matrix[1][2]*v2;
    c3 = spline_matrix[2][0]*v0 + spline_matrix[2][1]*v1 + spline_matrix[2][2]*v2 + spline_matrix[2][3]*v3;
    c4 = spline_matrix[3][0]*v0 + spline_matrix[3][1]*v1 + spline_matrix[3][2]*v2 + spline_matrix[3][3]*v3;

    return(c4*x*x*x + c3*x*x +c2*x + c1);
};

仅当我返回最终的H_iy值时,fps才会降低. 返回值如何影响性能?

The fps only decreases when I return the final, H_iy value. How does the return value affects the performance?

编辑我刚刚意识到,我在interpolate()函数中使用了double来声明c1c2 ... ect. 我将其更改为float,并且现在在使用正确的返回值的情况下仍保持良好的性能. 因此,问题有所改变:

EDIT I've just realized that I used double in the interpolate() function to declare c1, c2...ect. I've changed it to float, and the performance now remains good with the proper return value. So the question changes a bit:

double精度变量如何影响硬件的性能,以及为什么另一个插值函数没有触发此性能损失,仅是最后一个,因为H_ix[]数组是float就像H_iy?

How does a double precision variable affects the performance of the hardware, and why didn't the other interpolation function trigger this performance loss, only the last one, since the H_ix[] array was float too, just like the H_iy?

推荐答案

加快速度的一件事是使用texelFetch()而不是floor()/texture(),因此硬件不会浪费时间任何过滤.尽管硬件过滤非常快,这也是为什么我链接了 gpu宝石文章的部分原因.现在还有一个textureSize()函数,可以保存自己传递的值.

One thing you could do to speed this up is use texelFetch() instead of floor()/texture(), so the hardware doesn't waste time doing any filtering. Though hardware filtering is quite fast which is partly why I linked the gpu gems article. There's also now a textureSize() function which saves passing the values in yourself.

GLSL有一个非常激进的优化器,它会丢弃所有可能的东西.因此,可以说您花了很长时间计算一个非常昂贵的照明值,但是最后只说了colour = vec4(1),您所有的计算都将被忽略,并且运行非常快.在尝试对事物进行基准测试时,这可能需要一些习惯.我相信这是您在返回不同值时看到的问题.想象一下,每个变量都有一个依赖关系树,如果在输出中未使用任何变量,包括统一和属性,甚至在整个着色器阶段,GLSL都会完全忽略它.我见过的GLSL编译器无法做到的一个地方是不需要时复制输入/输出函数参数.

GLSL has a very aggressive optimizer, which throws away everything it possibly can. So lets say you spend ages computing a really expensive lighting value, but at the end just say colour = vec4(1), all your computation gets ignored and it runs really fast. This can take some getting used to when trying to benchmark things. I believe this is the issue you see when returning different values. Imagine every variable has a dependency tree and if any variable isn't used in an output, including uniforms and attributes and even across the shader stages, GLSL ignores it completely. One place I've seen GLSL compilers fall short here is in copying in/out function arguments when it doesn't have to.

关于双精度,这里有一个类似的问题:

As for the double precision, a similar question is here: https://superuser.com/questions/386456/why-does-a-geforce-card-perform-4x-slower-in-double-precision-than-a-tesla-card. In general, graphics needs to be fast and nearly always just uses single precision. For the more general purpose computing applications, eg scientific simulations, doubles of course give higher accuracy. You'll probably find a lot more about this in relation to CUDA.

这篇关于GLSL性能-函数返回值/类型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆