性能-在OpenGL中绘制许多2D圆 [英] performance - drawing many 2d circles in opengl

查看:133
本文介绍了性能-在OpenGL中绘制许多2D圆的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图在opengl中为我的2d游戏绘制大量2d圆.它们都是相同的大小,并且具有相同的纹理.许多精灵重叠.最快的方法是什么?

I am trying to draw large numbers of 2d circles for my 2d games in opengl. They are all the same size and have the same texture. Many of the sprites overlap. What would be the fastest way to do this?

我正在制作http://img805的那种效果的示例. imageshack.us/img805/6379/circles.png

(应注意,黑色边缘仅是由于圆形爆炸的扩大所致.在拍摄此屏幕快照后的一瞬间,它就被填充了.

(It should be noted that the black edges are just due to the expanding explosion of circles. It was filled in a moment after this screen-shot was taken.

目前,我正在使用一对带纹理的三角形制作每个圆圈.我在纹理的边缘周围具有透明性,以使其看起来像一个圆形.事实证明,使用混合非常慢(并且z剔除是不可能的,因为它们被渲染为深度缓冲区的正方形).相反,我不使用混合,而是让片段着色器丢弃alpha为0的所有片段.这是可行的,但是,这意味着早期的z是不可能的(因为片段被丢弃了).

At the moment I am using a pair of textured triangles to make each circle. I have transparency around the edges of the texture so as to make it look like a circle. Using blending for this proved to be very slow (and z culling was not possible as they were rendered as squares to the depth buffer). Instead I am not using blending but having my fragment shader discard any fragments with an alpha of 0. This works, however it means that early z is not possible (as fragments are discarded).

速度受大量透支和GPU填充率的限制.画圆圈的顺序并不重要(前提是在帧之间不会产生闪烁,但不会改变),因此我一直在尝试确保屏幕上的每个像素只能写入一次.

The speed is limited by the large amounts of overdraw and the gpu's fillrate. The order that the circles are drawn in doesn't really matter (provided it doesn't change between frames creating flicker) so I have been trying to ensure each pixel on the screen can only be written to once.

我尝试使用深度缓冲区进行此操作.在每个帧的开始将其清除为1.0f.然后,当绘制一个圆时,它将深度缓冲区的该部分更改为0.0f.当通常要绘制另一个圆时,它不是,因为新圆也具有0.0f的z值.这不小于深度缓冲区中当前存在的0.0f,因此不会绘制它.这行得通,并且应减少必须绘制的像素数.然而;奇怪的是它并没有更快.我已经问过有关此行为的问题( opengl深度当点具有相同的深度时缓冲速度会变慢),并且建议在使用相同的z值时不会加快z剔除的速度.

I attempted this by using the depth buffer. At the start of each frame it is cleared to 1.0f. Then when a circle is drawn it changes that part of the depth buffer to 0.0f. When another circle would normally be drawn there it is not as the new circle also has a z of 0.0f. This is not less than the 0.0f that is currently there in the depth buffer so it is not drawn. This works and should reduce the number of pixels which have to be drawn. However; strangely it isn't any faster. I have already asked a question about this behavior (opengl depth buffer slow when points have same depth) and the suggestion was that z culling was not being accelerated when using equal z values.

相反,我必须给所有圆添加从0向上的错误的z值.然后,当我使用glDrawArrays和默认值GL_LESS进行渲染时,由于z剔除,我们可以正确地提高速度(尽管早期z不可能实现,因为会丢弃碎片以使圆圈成为可能).但是,这并不是理想的选择,因为我不得不为2d游戏添加大量与z相关的代码,而这根本就不需要它(并且尽可能不传递z值会更快).但是,这是我目前发现的最快方法.

Instead I have to give all of my circles separate false z-values from 0 upwards. Then when I render using glDrawArrays and the default of GL_LESS we correctly get a speed boost due to z culling (although early z is not possible as fragments are discarded to make the circles possible). However this is not ideal as I've had to add in large amounts of z related code for a 2d game which simply shouldn't require it (and not passing z values if possible would be faster). This is however the fastest way I have currently found.

最后,我尝试使用模板缓冲区,在这里我使用了

Finally I have tried using the stencil buffer, here I used

glStencilFunc(GL_EQUAL, 0, 1);
glStencilOp(GL_KEEP, GL_INCR, GL_INCR);

每帧将模板缓冲区重置为0.这个想法是在第一次绘制像素后.然后在模板缓冲区中将其更改为非零.然后,不应再次绘制该像素,因此减少了透支量.但是,事实证明,这仅比在没有模板缓冲区或深度缓冲区的情况下绘制所有内容快.

Where the stencil buffer is reset to 0 each frame. The idea is that after a pixel is drawn to the first time. It is then changed to be none-zero in the stencil buffer. Then that pixel should not be drawn to again therefore reducing the amount of overdraw. However this has proved to be no faster than just drawing everything without the stencil buffer or a depth buffer.

人们发现我写的最快方法是什么?

What is the fastest way people have found to write do what I am trying?

推荐答案

基本问题是您填充受限,这是GPU无法着色您要求绘制的所有片段的阴影在您期望的时间.您使用深度缓冲技巧无效的原因是,处理中最耗时的部分是对片段进行着色(通过您自己的片段着色器或通过固定功能的着色引擎),这发生在在进行深度测试之前.使用模版会发生相同的问题;遮挡之前发生像素阴影.

The fundamental problem is that you're fill limited, which is the GPUs inability to shade all the fragments you ask it to draw in the time you're expecting. The reason that you're depth buffering trick isn't effective is that the most time-comsuming part of processing is shading the fragments (either through your own fragment shader, or through the fixed-function shading engine), which occurs before the depth test. The same issue occurs for using stencil; shading the pixel occurs before stenciling.

有些事情可能会有所帮助,但它们取决于您的硬件:

There are a few things that may help, but they depend on your hardware:

  • 通过深度缓冲从前向后渲染精灵.现代GPU通常在将片段发送出去进行着色之前,先尝试确定片段的集合是否可见.粗略地说,将检查深度缓冲区(或深度缓冲区)以查看将要着色的片段是否可见,如果不可见,则将在该点终止处理.这应该有助于减少需要写入帧缓冲区的像素数量.
  • 使用片段着色器可立即检查纹理像素的alpha值,并在进行任何其他处理之前将片段丢弃,如下所示:

  • render your sprites from front to back with depth buffering. Modern GPUs often try to determine if a collection of fragments will be visible before sending them off to be shaded. Roughly speaking, the depth buffer (or a represenation of it) is checked to see if the fragment that's about to be shaded will be visible, and if not, it's processing is terminated at that point. This should help reduce the number of pixels that need to be written to the framebuffer.
  • Use a fragment shader that immediately checks your texel's alpha value, and discards the fragment before any additional processing, as in:

varying vec2 texCoord;
uniform sampler2D tex;

void main()
{
    vec4 texel = texture( tex, texCoord );

    if ( texel.a < 0.01 ) discard;

    // rest of your color computations
}

(您还可以在固定功能片段处理中使用alpha测试,但是无法确定是否在片段着色完成之前应用该测试).

(you can also use alpha test in fixed-function fragment processing, but it's impossible to say if the test will be applied before the completion of fragment shading).

这篇关于性能-在OpenGL中绘制许多2D圆的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆