在Three.js中调试低FPS [英] Debugging low FPS in Three.js

查看:166
本文介绍了在Three.js中调试低FPS的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在处理Three.js WebGL场景,并且在缩小时注意到60 FPS,以便可以看到所有观测值(约20,000个三角形),但是在放大时,FPS却很低只能看到一小部分三角形.

I'm working on a Three.js WebGL scene and am noticing 60 FPS when I'm zoomed out so that all observations (~20,000 triangles) are in view, but very low FPS when I'm zoomed in so that only a small subset of the triangles are in view.

我想弄清楚是什么原因造成了这种差异.我的直觉是相反的情况是正确的:我假设当用户在近,远剪切平面中缩放时,会从场景中删除许多三角形,这会增加FPS.我想弄清楚为什么在这种情况下这种直觉是错误的.

I'd like to figure out what's causing this discrepancy. My intuition is that the opposite would be true: I'd assume when the user is zoomed in the near and far clipping planes would remove many triangles from the scene which would increase FPS. I want to figure out why this intuition is wrong in this scene.

如何识别在three.js程序中使用的完整调用堆栈?理想情况下,我想确定所有函数/方法调用以及执行该函数所需的时间,以便我可以尝试找出正在放大的着色器的哪一部分会在用户放大时杀死FPS.

How can one identify the full stack of calls used within a three.js program? Ideally I'd like to identify all the function / method calls and the time required to execute that function so that I can try and figure out which portion of the shaders I'm working on are killing the FPS when the user is zoomed in.

推荐答案

GPU具有一些基本的运算能力.这应该是很明显的.一种是每个顶点运行一次顶点着色器.另一个是每个像素/片段运行一次片段着色器.

GPUs have a few basic places where they spend computing power. It should be pretty obvious. One is running the vertex shader once per vertex. The other is running the fragment shader once per pixel/fragment.

几乎总是比顶点多一倍的像素.单个1920x1080屏幕将近200万像素,但可以覆盖3个顶点三角形或4或6个顶点四边形(2个三角形).这意味着覆盖整个屏幕的顶点着色器运行了3到6次,而片段着色器运行了200万次!!

There are almost always a ton more pixels than vertices. A single 1920x1080 screen is nearly 2 million pixels yet can be covered in a 3 vertex triangle or a 4 or 6 vertex quad (2 triangles). That means to cover the entire screen the vertex shader ran 3 to 6 times but the fragment shader ran 2 million times!!!

向片段着色器发送过多的工作称为填充绑定".您将填充率最大化(用像素填充三角形),这就是您所看到的.在更糟糕的情况下,在我的2014 MacBook Pro上,在达到填充速率限制(以每秒60帧的速度更新屏幕)之前,我可能只能绘制6个左右的屏幕像素.

Sending too much work to the fragment shader is called being "fill bound". You maxed out the fill rate (filling triangles with pixels) and that is what you're seeing. In the worse case on my 2014 MacBook Pro I might be able to only draw at 6 or so screens worth of pixels before I've hit the fill rate limit for updating the screen at 60 frames a second.

对此有多种解决方案.

There are various solutions to this.

第一个是z缓冲区. GPU将首先测试深度缓冲区,以查看是否需要运行片段着色器.如果深度测试失败,则GPU不需要运行片段着色器.因此,如果您对不透明的对象进行排序和绘制,则最靠近的对象首先到达最远的对象,然后,距离中的大多数对象在渲染其三角形的像素时将无法通过深度测试.请注意,只有在片段着色器不写入gl_FragDepth并且不使用discard关键字的情况下,才有可能.

The first is the z-buffer. The GPU will first test the depth buffer to see if it needs to run the fragment shader at all. If the depth test fails the GPU does not need to run the fragment shader. So, if you sort and draw opaque objects, closest objects first to furthest object last, then most of those objects in the distance will fail the depth test when rendering the pixels of their triangles. Note that this is only possible if your fragment shader does not write to gl_FragDepth and does not use the discard keyword.

这是一种避免透支"的方法.过度绘制是指绘制多次的任何像素.如果您在远处绘制一个立方体,然后近距离绘制一个球体,使其覆盖该立方体,则对于为该立方体渲染的每个像素,球体像素都会覆盖"该像素.那是浪费时间.

This is a method of "avoiding overdraw". Overdraw is any pixel that is drawn more than once. If you draw a cube in the distance and then draw a sphere up close such that it covers the cube then for every pixel that was rendered for the cube it was "overdrawn" by the sphere pixels. That was a waste of time.

如果您的片段着色器真的很复杂,因此运行缓慢,某些3D引擎将绘制"Z缓冲区预传递".他们将使用最简单的顶点和片段着色器绘制所有不透明的几何图形.顶点着色器仅需要位置.片段着色器仅发出一个恒定值.他们甚至会关闭对颜色缓冲区gl.colorMask(false, false, false, false)的绘图,或者如果硬件支持的话,可能只制作深度仅帧缓冲区.然后,他们使用它来填充深度缓冲区.完成后,他们将使用昂贵的着色器和将深度测试设置为LEQUAL(或适用于其引擎的任何方法)再次渲染所有内容.这样,每个像素将仅渲染一次.当然它不是免费的,它仍然需要GPU时间来尝试对三角形进行栅格化并测试每个像素,但是如果着色器很昂贵,它仍然比透支更快.

If your fragment shaders are really complicated and therefore slow to run some 3D engines will draw a "Z buffer pre-pass". They'll draw all the opaque geometry with the simplest vertex and fragment shader. The vertex shader only needs position. The fragment shader just emits a constant value. They'll even turn off drawing to the color buffer gl.colorMask(false, false, false, false) or possibly make a depth only framebuffer if that's supported by the hardware. They then use this to fill out the depth buffer. When finished they render everything again with the expensive shader and the depth test set to LEQUAL (or whatever works for their engine). In this way every pixel will only be rendered once. Of course it's not free, it still takes the GPU time to try to rasterize the triangles and test every pixel but it can still be faster than overdraw if the shaders are expensive.

另一种方法是尝试找出哪些对象将被更近的对象遮挡,甚至不将其提交给GPU. 有很多方法可以做到这一点,通常涉及边界球或边界框.某些潜在可见的集合技术也可以帮助遮挡剔除.您甚至可以要求GPU使用遮挡查询尽管仅在WebGL2中可用

Another way is to try to figure out which objects are going to be occluded by closer objects and not even submit them to the GPU. There are tons of ways to do this, usually involving bounding spheres and or bounding boxes. Some potentially visible sets techniques can also help with occlusion culling. You can even ask the GPU to compute some of this using occlusion queries though that's only available in WebGL2

查看是否被填充的最简单方法是使画布变小,例如2x1像素(或者只是将浏览器窗口的尺寸变小).如果您的应用开始快速运行,则很可能已达到极限.如果它仍然运行缓慢,则可能是几何绑定(顶点着色器做过多的工作)或CPU绑定(无论您在CPU上所做的任何工作都花费了太长时间,无论是调用WebGL命令还是计算动画或碰撞)或物理学之类的东西.

The easiest way to see if you're fill bound is to make your canvas tiny, like 2x1 pixels (or just size your browser window really small). If your app starts running fast it's likely fill bound. If it's still running slow it could either be geometry bound (the vertex shader is doing too much work) or it's CPU bound (whatever work you're doing on the CPU is taking too long whether that's just calling WebGL commands or computing animation or collisions or physics or whatever).

在您的情况下,您可能会处于填充边界,因为您看到当所有三角形都较小时,它运行速度快(因为绘制了很少的像素),而当您放大并且许多三角形覆盖了屏幕时,它运行速度却很慢(因为绘制了太多像素).

In your case you likely are fill bound since you see when all the triangles are small it runs fast (because very few pixels are being drawn) vs when you're zoomed in and lots of triangles cover the screen then it runs slow (because too many pixels are being drawn).

没有真正的简单"解决方案.我真的只是取决于您要做什么.显然您使用的是three.js,我知道它可以对透明对象进行排序.我不知道它是否为不透明的对象排序.我认为列出的其他技术超出了three.js的范围,并且更多的取决于您的应用程序,以便将事物带入或带出场景或将其可见性设置为false等.

There are no really "simple" solutions. I really just depends on what you're trying to do. Apparently you're using three.js, I know it can sort for transparent objects. I have no idea if it sorts for opaque objects. The other techniques listed I believe are kind of outside the scope of three.js and more up to your app to take things in and out of the scene or set their visibility to false etc...

注意:这是一个简单的演示,用于显示GPU可以处理多少透支 .它只是绘制了一堆全屏四边形.默认情况下,在无法再达到60fps之前,它可能无法绘制那么多图像(尤其是全屏尺寸).启用从前到后的排序功能,它将可以绘制更多内容,但仍然达到60fps.

Note: here is a simple demo to show how little overdraw your GPU can handle. It just draws a bunch of fullscreen quads. By default it likely can't draw that many, especially at fullscreen size, before it can no longer hit 60fps. Turn on sorting front to back and it will be able to draw more and still hit 60fps.

还请注意,启用混合比禁用混合要慢.这应该很清楚,因为在不混合GPU的情况下,像素只会写入像素.使用混合功能时,GPU必须首先读取目标像素,以便它可以进行混合,因此速度较慢.

Also note that enabling blending is slower than with blending disabled. This should be clear because without blending the GPU just writes the pixel. With blending the GPU has to first read the destination pixel so that it can do the blending therefore it's slower.

这篇关于在Three.js中调试低FPS的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆