什么时候VBO比“简单”更快? OpenGL基元(glBegin())? [英] When are VBOs faster than "simple" OpenGL primitives (glBegin())?

查看:154
本文介绍了什么时候VBO比“简单”更快? OpenGL基元(glBegin())?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

经过多年的关于顶点缓冲区对象(VBOs)的听闻后,我终于决定尝试它们了(我的东西通常不是性能的关键,显然......)

我会在下面描述我的实验,但为了简短起见,我在简单直接模式(glBegin()/ glEnd()),顶点数组(CPU端)和VBO( GPU端)渲染模式。我试图理解这是为什么,以及在什么条件下,我可以期望看到维也纳组织的工作人员明显地超越了他们原始的双关语祖先。实验细节 h2>

在实验中,我生成了大量点的(静态)3D高斯云。每个点具有顶点&与其相关的颜色信息。然后我以相似的轨道行为在相继的帧中围绕云端旋转相机。再次,点是静态的,只有眼睛移动(通过gluLookAt())。数据在任何渲染&存储在两个数组中供在渲染循环中使用。对于直接渲染,整个数据集呈现在单个glBegin()/ glEnd()块中,每个glColor3fv()和glVertex3fv ()。



对于顶点数组和VBO渲染,整个数据集只用一次glDrawArrays()调用就可以呈现。


$ b $然后,我只是在一个紧密的循环中运行一分钟左右,然后用高性能定时器测量平均FPS。

性能结果##

h2>

如上所述,我的台式机(XP x64,8GB RAM,512 MB Quadro 1700)和我的笔记本电脑(XP32,4GB RAM,256 MB Quadro NVS 110)。不过,它的确按照预期的点数进行了缩放。显然,我也禁用了vsync。



笔记本电脑运行的具体结果(呈现w / GL_POINTS):

glBegin ()/ glEnd():


  • 1K pts - > 603 FPS

  • 10K pts - - > 401 FPS

  • 100K pts - > 97 FPS
  • 1M pts - > 14 FPS

  • ul>

    Vertex Arrays(CPU端):


    • 1K pts - > 603 FPS

    • 10K pts - > 402 fps
    • 100K pts - > 97 FPS
    • > 14 FPS



    顶点缓冲区对象(GPU端):


      <1> 1K pts - > 604 FPS
      <10> 10K pts - > 399 fps
      <100> 100k pts - > 95 FPS
    • 1M pts - > 14 FPS



    我使用GL_TRIANGLE_STRIP呈现了相同的数据,难以区分(虽然由于额外的光栅化而如预期的那样较慢)。如果有人愿意,我也可以发布这些数字。



    问题




    • / li>
    • 我需要做些什么来实现VBOs承诺的性能收益?

    • 我缺少什么?


    解决方案

    优化3D渲染有很多因素。
    通常有4个瓶颈:

    $ ul

  • CPU(创建顶点,APU调用,其他所有内容)
  • 总线(CPU→GPU传输)
  • 顶点(固定功能管线执行中的顶点着色器)
  • 像素(填充,片段着色器执行和rops)


你的测试给出了倾斜的结果,因为你有很多CPU(和总线),同时最大化顶点或像素吞吐量。 VBO用于降低CPU(较少的API调用,与CPU DMA传输并行)。既然你没有CPU绑定,他们不会给你任何收益。这是优化101.例如,在游戏中,CPU变得宝贵,因为AI和物理等其他方面都需要CPU,而不仅仅是发出大量的API调用。很容易看出,将顶点数据(例如3个浮点数)直接写入内存指针要比调用将3个浮点数写入内存的函数快得多 - 至少可以节省调用的周期。

After many years of hearing about Vertex Buffer Objects (VBOs), I finally decided to experiment with them (my stuff isn't normally performance critical, obviously...)

I'll describe my experiment below, but to make a long story short, I'm seeing indistinguishable performance between "simple" direct mode (glBegin()/glEnd()), vertex array (CPU side) and VBO (GPU side) rendering modes. I'm trying to understand why this is, and under what conditions I can expect to see the VBOs significantly outshine their primitive (pun intended) ancestors.

Experiment Details

For the experiment, I generated a (static) 3D Gaussian cloud of a large number of points. Each point has vertex & color information associated with it. Then I rotated the camera around the cloud in successive frames in sort of an "orbiting" behavior. Again, the points are static, only the eye moves (via gluLookAt()). The data are generated once prior to any rendering & stored in two arrays for use in the rendering loop.

For direct rendering, the entire data set is rendered in a single glBegin()/glEnd() block with a loop containing a single call each to glColor3fv() and glVertex3fv().

For vertex array and VBO rendering, the entire data set is rendered with a single glDrawArrays() call.

Then, I simply run it for a minute or so in a tight loop and measure average FPS with the high performance timer.

Performance Results ##

As mentioned above, performance was indistinguishable on both my desktop machine (XP x64, 8GB RAM, 512 MB Quadro 1700), and my laptop (XP32, 4GB ram, 256 MB Quadro NVS 110). It did scale as expected with the number of points, however. Obviously, I also disabled vsync.

Specific results from laptop runs (rendering w/GL_POINTS):

glBegin()/glEnd():

  • 1K pts --> 603 FPS
  • 10K pts --> 401 FPS
  • 100K pts --> 97 FPS
  • 1M pts --> 14 FPS

Vertex Arrays (CPU side):

  • 1K pts --> 603 FPS
  • 10K pts --> 402 FPS
  • 100K pts --> 97 FPS
  • 1M pts --> 14 FPS

Vertex Buffer Objects (GPU side):

  • 1K pts --> 604 FPS
  • 10K pts --> 399 FPS
  • 100K pts --> 95 FPS
  • 1M pts --> 14 FPS

I rendered the same data with GL_TRIANGLE_STRIP and got similarly indistinguishable (though slower as expected due to extra rasterization). I can post those numbers too if anybody wants them. .

Question(s)

  • What gives?
  • What do I have to do to realize the promised performance gain of VBOs?
  • What am I missing?

解决方案

There are a lot of factors to optimizing 3D rendering. usually there are 4 bottlenecks:

  • CPU (creating vertices, APU calls, everything else)
  • Bus (CPU<->GPU transfer)
  • Vertex (vertex shader over fixed function pipeline execution)
  • Pixel (fill, fragment shader execution and rops)

Your test is giving skewed results because you have a lot of CPU (and bus) while maxing out vertex or pixel throughput. VBOs are used to lower CPU (fewer api calls, parallel to CPU DMA transfers). Since you are not CPU bound, they don't give you any gain. This is optimization 101. In a game for example CPU becomes precious as it is needed for other things like AI and physics, not just for issuing tons of api calls. It is easy to see that writing vertex data (3 floats for example) directly to a memory pointer is much faster than calling a function that writes 3 floats to memory - at the very least you save the cycles for the call.

这篇关于什么时候VBO比“简单”更快? OpenGL基元(glBegin())?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆