带有PBO的异步glReadPixels [英] Asynchronous glReadPixels with PBO

查看:424
本文介绍了带有PBO的异步glReadPixels的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用两个PBO以其他方式读取像素.我认为PBO方式会更快,因为使用PBO时glReadPixels会立即返回,并且很多时间会重叠.

I want to use two PBOs to read pixel in alternative way. I thought the PBO way will much faster, because glReadPixels returns immediately when using PBO, and a lot of time can be overlapped.

奇怪的是,似乎没有太大的好处.考虑如下代码:

Strangely there seems to be not much benefit. Considering some code like:

    glBindBufferARB(GL_PIXEL_PACK_BUFFER_ARB, 0);
    Timer t; t.start();
    glReadPixels(0,0,1024,1024,GL_RGBA, GL_UNSIGNED_BYTE, buf);
    t.stop(); std::cout << t.getElapsedTimeInMilliSec() << " ";

    glBindBufferARB(GL_PIXEL_PACK_BUFFER_ARB, pbo);
    t.start();
    glReadPixels(0,0,1024,1024,GL_RGBA, GL_UNSIGNED_BYTE, 0);
    t.stop(); std::cout << t.getElapsedTimeInMilliSec() << std::endl;

结果是

1.301 1.185
1.294 1.19
1.28 1.191
1.341 1.254
1.327 1.201
1.304 1.19
1.352 1.235

PBO方式要快一些,但不能令人满意.立即返回.

The PBO way is a little faster, but not a satisfactory immediate-return

我的问题是:

  • 影响glReadPixels性能的因素是什么? Somethimes,它的成本达到10毫秒,但在这里为1.3毫秒.
  • 为什么立即返回要花费多达1.2毫秒的费用?它太大还是正常?
  • What is the factor affecting glReadPixels' performance? Somethimes, the cost of it reaches 10ms, but 1.3ms here.
  • Why immediate-return costs as much as 1.2ms? Is it too big or just normal?

================================================ ==========================

===========================================================================

根据与演示的比较,我发现了两个因素:

According to comparison with a demo, I found two factors:

  • GL_BGRA比GL_RGBA好,1.3ms => 1.0ms(无PBO),1.2ms => 0.9ms(含pbo)
  • glutInitDisplayMode(GLUT_RGB | GLUT_ALPHA)而不是GLUT_RGBA,0.9ms => 0.01ms.这就是我想要的性能.在我的系统中,GLUT_RGBA = GLUT_RGB = 0. GLUT_ALPHA = 8

然后再问两个问题:

  • 为什么GL_BGRA比GL_RGBA好?只是特定平台还是所有平台都适用?
  • 为什么GLUT_ALPHA如此重要,以至于会严重影响PBO的性能?

推荐答案

我不太清楚glutInitDisplayMode,但这通常是因为您的内部和外部格式不匹配.例如,当组件数量不匹配时,您将不会注意到异步行为,因为此转换仍会阻止glReadPixels.

I do not know glutInitDisplayMode by heart, but this typically is because your internal and external format do not match. For example, you won't notice the asynchronous behaviour when the number of components do not match because this conversion still blocks the glReadPixels.

所以最可能的问题是,使用glutInitDisplay(GLUT_RGBA)时,您实际上会创建一个默认帧缓冲,其内部格式实际上是RGB甚至是BGR.传递GLUT_ALPHA参数很可能会使其内部成为RGBABGRA,与您想要的组件数量相匹配.

So the most likely issue is that with glutInitDisplay(GLUT_RGBA) you will actually create a default framebuffer with an internal format that's actually RGB or even BGR. passing the GLUT_ALPHA parameter is likely to make it RGBA or BGRA internally, which matches the number of components you want.

我找到了 nvidia文件有关像素填充和性能影响的一些问题.

edit: I found an nvidia document explaining some issues about pixel packing and performance influence.

edit2:BGRA的性能提升可能是因为内部硬件缓冲区位于BGRA中,实际上并没有太多作用.

edit2: The performance gain of BGRA is likely because the internal hw buffer is in BGRA, there's not really much more to it.

这篇关于带有PBO的异步glReadPixels的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆