带有PBO的异步glReadPixels [英] Asynchronous glReadPixels with PBO
问题描述
我想使用两个PBO以其他方式读取像素.我认为PBO方式会更快,因为使用PBO时glReadPixels会立即返回,并且很多时间会重叠.
I want to use two PBOs to read pixel in alternative way. I thought the PBO way will much faster, because glReadPixels returns immediately when using PBO, and a lot of time can be overlapped.
奇怪的是,似乎没有太大的好处.考虑如下代码:
Strangely there seems to be not much benefit. Considering some code like:
glBindBufferARB(GL_PIXEL_PACK_BUFFER_ARB, 0);
Timer t; t.start();
glReadPixels(0,0,1024,1024,GL_RGBA, GL_UNSIGNED_BYTE, buf);
t.stop(); std::cout << t.getElapsedTimeInMilliSec() << " ";
glBindBufferARB(GL_PIXEL_PACK_BUFFER_ARB, pbo);
t.start();
glReadPixels(0,0,1024,1024,GL_RGBA, GL_UNSIGNED_BYTE, 0);
t.stop(); std::cout << t.getElapsedTimeInMilliSec() << std::endl;
结果是
1.301 1.185
1.294 1.19
1.28 1.191
1.341 1.254
1.327 1.201
1.304 1.19
1.352 1.235
PBO方式要快一些,但不能令人满意.立即返回.
The PBO way is a little faster, but not a satisfactory immediate-return。
我的问题是:
- 影响glReadPixels性能的因素是什么? Somethimes,它的成本达到10毫秒,但在这里为1.3毫秒.
-
为什么立即返回要花费多达1.2毫秒的费用?它太大还是正常?
- What is the factor affecting glReadPixels' performance? Somethimes, the cost of it reaches 10ms, but 1.3ms here.
Why immediate-return costs as much as 1.2ms? Is it too big or just normal?
================================================ ==========================
===========================================================================
根据与演示的比较,我发现了两个因素:
According to comparison with a demo, I found two factors:
- GL_BGRA比GL_RGBA好,1.3ms => 1.0ms(无PBO),1.2ms => 0.9ms(含pbo)
- glutInitDisplayMode(GLUT_RGB | GLUT_ALPHA)而不是GLUT_RGBA,0.9ms => 0.01ms.这就是我想要的性能.在我的系统中,GLUT_RGBA = GLUT_RGB = 0. GLUT_ALPHA = 8
然后再问两个问题:
- 为什么GL_BGRA比GL_RGBA好?只是特定平台还是所有平台都适用?
- 为什么GLUT_ALPHA如此重要,以至于会严重影响PBO的性能?
推荐答案
我不太清楚glutInitDisplayMode
,但这通常是因为您的内部和外部格式不匹配.例如,当组件数量不匹配时,您将不会注意到异步行为,因为此转换仍会阻止glReadPixels
.
I do not know glutInitDisplayMode
by heart, but this typically is because your internal and external format do not match. For example, you won't notice the asynchronous behaviour when the number of components do not match because this conversion still blocks the glReadPixels
.
所以最可能的问题是,使用glutInitDisplay(GLUT_RGBA)
时,您实际上会创建一个默认帧缓冲,其内部格式实际上是RGB
甚至是BGR
.传递GLUT_ALPHA
参数很可能会使其内部成为RGBA
或BGRA
,与您想要的组件数量相匹配.
So the most likely issue is that with glutInitDisplay(GLUT_RGBA)
you will actually create a default framebuffer with an internal format that's actually RGB
or even BGR
. passing the GLUT_ALPHA
parameter is likely to make it RGBA
or BGRA
internally, which matches the number of components you want.
我找到了 nvidia文件有关像素填充和性能影响的一些问题.
edit: I found an nvidia document explaining some issues about pixel packing and performance influence.
edit2:BGRA
的性能提升可能是因为内部硬件缓冲区位于BGRA
中,实际上并没有太多作用.
edit2: The performance gain of BGRA
is likely because the internal hw buffer is in BGRA
, there's not really much more to it.
这篇关于带有PBO的异步glReadPixels的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!