为什么这个OpenGL ES代码在iPhone上速度慢? [英] Why is this OpenGL ES code slow on iPhone?

查看:149
本文介绍了为什么这个OpenGL ES代码在iPhone上速度慢?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在学习OpenGL ES的过程中略微修改了iPhone SDK的GLSprite示例,但结果却很慢。即使在模拟器中(最糟糕的是),所以我必须做错事,因为它只有400个纹理三角形。

I've slightly modified the iPhone SDK's GLSprite example while learning OpenGL ES and it turns out to be quite slow. Even in the simulator (on the hw worst) so I must be doing something wrong since it's only 400 textured triangles.

const GLfloat spriteVertices[] = {
  0.0f, 0.0f, 
  100.0f, 0.0f,  
  0.0f, 100.0f,
  100.0f, 100.0f
};

const GLshort spriteTexcoords[] = {
  0,0,
  1,0,
  0,1,
  1,1
};

- (void)setupView {
    glViewport(0, 0, backingWidth, backingHeight);
    glMatrixMode(GL_PROJECTION);
    glLoadIdentity();
    glOrthof(0.0f, backingWidth, backingHeight,0.0f, -10.0f, 10.0f);
    glMatrixMode(GL_MODELVIEW);

    glClearColor(0.3f, 0.0f, 0.0f, 1.0f);

    glVertexPointer(2, GL_FLOAT, 0, spriteVertices);
    glEnableClientState(GL_VERTEX_ARRAY);
    glTexCoordPointer(2, GL_SHORT, 0, spriteTexcoords);
    glEnableClientState(GL_TEXTURE_COORD_ARRAY);

    // sprite data is preloaded. 512x512 rgba8888   
    glGenTextures(1, &spriteTexture);
    glBindTexture(GL_TEXTURE_2D, spriteTexture);
    glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, spriteData);
    free(spriteData);

    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR);

    glEnable(GL_TEXTURE_2D);
    glBlendFunc(GL_ONE, GL_ONE_MINUS_SRC_ALPHA);
    glEnable(GL_BLEND);
} 

- (void)drawView {
  ..
    glClear(GL_COLOR_BUFFER_BIT);
    glLoadIdentity();
    glTranslatef(tx-100, ty-100,10);
    for (int i=0; i<200; i++) { 
        glTranslatef(1, 1, 0);
        glDrawArrays(GL_TRIANGLE_STRIP, 0, 4);
    }
  ..
}

每次调用drawView触摸屏幕或移动屏幕上的手指,并将tx,ty设置为触摸发生的x,y坐标。

drawView is called every time the screen is touched or the finger on the screen is moved and tx,ty are set to the x,y coordinates where that touch happened.

我也尝试过使用GLBuffer,当预先生成翻译时,只有一个DrawArray,但性能相同(~4 FPS)。

I've also tried using GLBuffer, when translation was pre-generated and there was only one DrawArray but gave the same performance (~4 FPS).

===编辑===

同时我修改了这个以便使用更小的四边形(尺寸:34x20)并且完成了更少的重叠。整个屏幕上有~400个四边形 - > 800个三角形。纹理大小为512x512 atlas和RGBA_8888,而纹理坐标为float。
代码在API效率方面非常难看:有两个MatrixMode更改以及两个加载和两个转换然后是三角形条带(quad)的drawarrays。
现在这产生~45 FPS。

Meanwhile I've modified this so that much smaller quads are used (sized: 34x20) and much less overlapping is done. There are ~400 quads->800 triangles spread on the whole screen. Texture size is 512x512 atlas and RGBA_8888 while the texture coordinates are in float. The code is very ugly in terms of API efficiency: there are two MatrixMode change along with two loads and two translation then a drawarrays for a triangle strip (quad). Now this produces ~45 FPS.

推荐答案

(我知道这已经很晚了,但我无法抗拒无论如何,我会发帖,以防其他人来这里寻求建议。)

(I know this is very late, but I couldn't resist. I'll post anyway, in case other people come here looking for advice.)

这与纹理大小无关。我不知道为什么人们评价Nils。他似乎对OpenGL管道存在根本性的误解。他似乎认为对于给定的三角形,整个纹理被加载并映射到该三角形上。反之亦然。

This has nothing to do with the texture size. I don't know why people rated up Nils. He seems to have a fundamental misunderstanding of the OpenGL pipeline. He seems to think that for a given triangle, the entire texture is loaded and mapped onto that triangle. The opposite is true.

将三角形映射到视口后,它将被栅格化。对于三角形覆盖的每个屏幕上像素,都会调用片段着色器。默认的片段着色器(您正在使用的OpenGL ES 1.1)将查找最接近地映射((GL_NEAREST)的纹素到您正在绘制的像素。它可能会查找4个纹素,因为您使用更高质量的GL_LINEAR方法来平均最佳纹素。尽管如此,如果你的三角形中的像素数是100,那么你必须读取的最多纹理字节是4(查找)* 100(像素)* 4(每种颜色的字节。远远小于尼尔斯所说的。令人惊讶的是,他可以让它听起来像他实际上知道他在说什么。

Once the triangle has been mapped into the viewport, it is rasterized. For every on-screen pixel the your triangle covers, the fragment shader is called. The default fragment shader (OpenGL ES 1.1, which you are using) will lookup the texel that most closely maps (GL_NEAREST) to the pixel you are drawing. It might look up 4 texels since you are using the higher quality GL_LINEAR method to average the best texel. Still, if the pixel count in your triangle is, say 100, then the most texture bytes you will have to read is 4(lookups) * 100(pixels) * 4(bytes per color. Far far less than what Nils was saying. It's amazing that he can make it sound like he actually knows what he's talking about.

WRT平铺架构,这在嵌入式OpenGL设备中很常见,以保持参考的局部性。我相信每个瓷砖都会暴露在每个绘图操作中,快速剔除它们中的大部分。然后瓷砖决定自己绘制什么。当你打开混合时,这会慢得多。因为你正在使用可能与其他图块重叠并混合的大三角形,GPU必须做很多额外的工作。如果不是使用alpha边缘渲染示例正方形,而是渲染实际形状(而不是形状的方形图片) ),然后你可以关闭这部分场景的混合,我敢打赌,这将加快tre的速度mendously。

WRT the tiled architecture, this is common in embedded OpenGL devices to preserve locality of reference. I believe that each tile gets exposed to each drawing operation, quickly culling most of them. Then the tile decides what to draw on itself. This is going to be much slower when you have blending turned on, as you do. Because you are using large triangles that might overlap and blend with other tiles, the GPU has to do a lot of extra work. If, instead of rendering the example square with alpha edges, you were to render an actual shape (instead of a square picture of the shape), then you could turn off blending for this part of the scene and I bet that would speed things up tremendously.

如果你想尝试它,只需关闭混合,看看有多少东西加速,即使看起来不对。 glDisable(GL_BLEND);

If you want to try it, just turn off blending and see how much things speed up, even if the don't look right. glDisable(GL_BLEND);

这篇关于为什么这个OpenGL ES代码在iPhone上速度慢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆