OpenGL ES(WebGL)渲染许多小对象 [英] OpenGL ES(WebGL) rendering many small objects

查看:66
本文介绍了OpenGL ES(WebGL)渲染许多小对象的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要渲染许多小的对象(大小为2-100个三角形),它们位于深层次结构中,每个对象都有自己的矩阵.为了渲染它们,我为每个对象预先计算了实际矩阵,将对象放在一个列表中,并且我有两次调用来绘制每个对象:设置矩阵均值和gl.drawElements().

I need to render a lot of small objects (2 - 100 triangles in size) which lies in deep hierarchy and each object has its own matrix. In order to render them I precalculate actual matrix for each object, put objects in a single list and I have two calls to draw each object: set matrix uniform and gl.drawElements().

显然,这不是最快的方法.然后我有几千个对象的性能变得无法接受.我正在考虑的唯一解决方案是将多个对象批处理到单个缓冲区中.但这不是一件容易的事,因为每个对象都有自己的矩阵,并将对象放入共享缓冲区中,我需要在CPU上按矩阵转换其顶点.更糟糕的问题是,用户可以随时移动任何对象,而我需要再次重新计算较大的顶点数据(因为用户可以移动带有许多嵌套子代的对象)

Obviously it is not the fastest way to go. Then I have a few thousand objects performance becomes unacceptable. The only solution I'm thinking about is batch multiple objects into single buffer. But it isn't easy thing to do because each object has its own matrix and to put object into shared buffer I need to transform its vertices by matrix on CPU. Even worse problem is that user can move any objects at any time and I need to recalculate large vertex data again (because user can move object with many nested children)

所以我正在寻找替代方法.最近在onshape.com项目中发现了奇怪的顶点着色器:

So I'm looking for alternative approaches. And recently found strange vertex shaders in onshape.com project:

uniform mat4 uMVMatrix;
uniform mat3 uNMatrix;
uniform mat4 uPMatrix;
 
uniform vec3 uSpecular;
uniform float uOpacity;
uniform float uColorAmbientFactor;  //Determines how much of the vertex-specified color to use in the ambient term
uniform float uColorDiffuseFactor;  //Determines how much of the vertex-specified color to use in the diffuse term
 
uniform bool uApplyTranslucentAlphaToAll;
uniform float uTranslucentPassAlpha;
 
attribute vec3 aVertexPosition;
attribute vec3 aVertexNormal;
attribute vec2 aTextureCoordinate;
attribute vec4 aVertexColor;
 
varying vec3 vPosition;
varying lowp vec3 vNormal;
varying mediump vec2 vTextureCoordinate;
varying lowp vec3 vAmbient;
varying lowp vec3 vDiffuse;
varying lowp vec3 vSpecular;
varying lowp float vOpacity;
 
attribute vec4 aOccurrenceId;
 
float unpackOccurrenceId() {
  return aOccurrenceId.g * 65536.0 + aOccurrenceId.b * 256.0 + aOccurrenceId.a;
}
 
float unpackHashedBodyId() {
  return aOccurrenceId.r;
}
 
#define USE_OCCURRENCE_TEXTURE 1
 
#ifdef USE_OCCURRENCE_TEXTURE
 
uniform sampler2D uOccurrenceDataTexture;
uniform float uOccurrenceTexelWidth;
uniform float uOccurrenceTexelHeight;
#define ELEMENTS_PER_OCCURRENCE 2.0
 
void getOccurrenceData(out vec4 occurrenceData[2]) {
  // We will extract the occurrence data from the occurrence texture by converting the occurrence id to texture coordinates
 
  // Convert the packed occurrenceId into a single number
  float occurrenceId = unpackOccurrenceId();
 
  // We first determine the row of the texture by dividing by the overall texture width.  Each occurrence
  // has multiple rgba texture entries, so we need to account for each of those entries when determining the
  // element's offset into the buffer.
  float divided = (ELEMENTS_PER_OCCURRENCE * occurrenceId) * uOccurrenceTexelWidth;
  float row = floor(divided);
  vec2 coordinate;
  // The actual coordinate lies between 0 and 1.  We need to take care that coordinate lies on the texel
  // center by offsetting the coordinate by a half texel.
  coordinate.t = (0.5 + row) * uOccurrenceTexelHeight;
  // Figure out the width of one texel in texture space
  // Since we've already done the texture width division, we can figure out the horizontal coordinate
  // by adding a half-texel width to the remainder
  coordinate.s = (divided - row) + 0.5 * uOccurrenceTexelWidth;
  occurrenceData[0] = texture2D(uOccurrenceDataTexture, coordinate);
  // The second piece of texture data will lie in the adjacent column
  coordinate.s += uOccurrenceTexelWidth;
  occurrenceData[1] = texture2D(uOccurrenceDataTexture, coordinate);
}
 
#else
 
attribute vec4 aOccurrenceData0;
attribute vec4 aOccurrenceData1;
void getOccurrenceData(out vec4 occurrenceData[2]) {
  occurrenceData[0] = aOccurrenceData0;
  occurrenceData[1] = aOccurrenceData1;
}
 
#endif
 
/**
 * Create a model matrix from the given occurrence data.
 *
 * The method for deriving the rotation matrix from the euler angles is based on this publication:
 * http://www.soi.city.ac.uk/~sbbh653/publications/euler.pdf
 */
mat4 createModelTransformationFromOccurrenceData(vec4 occurrenceData[2]) {
  float cx = cos(occurrenceData[0].x);
  float sx = sin(occurrenceData[0].x);
  float cy = cos(occurrenceData[0].y);
  float sy = sin(occurrenceData[0].y);
  float cz = cos(occurrenceData[0].z);
  float sz = sin(occurrenceData[0].z);
 
  mat4 modelMatrix = mat4(1.0);
 
  float scale = occurrenceData[0][3];
 
  modelMatrix[0][0] = (cy * cz) * scale;
  modelMatrix[0][1] = (cy * sz) * scale;
  modelMatrix[0][2] = -sy * scale;
 
  modelMatrix[1][0] = (sx * sy * cz - cx * sz) * scale;
  modelMatrix[1][1] = (sx * sy * sz + cx * cz) * scale;
  modelMatrix[1][2] = (sx * cy) * scale;
 
  modelMatrix[2][0] = (cx * sy * cz + sx * sz) * scale;
  modelMatrix[2][1] = (cx * sy * sz - sx * cz) * scale;
  modelMatrix[2][2] = (cx * cy) * scale;
 
  modelMatrix[3].xyz = occurrenceData[1].xyz;
 
  return modelMatrix;
}
 
 
void main(void) {
  vec4 occurrenceData[2];
  getOccurrenceData(occurrenceData);
  mat4 modelMatrix = createModelTransformationFromOccurrenceData(occurrenceData);
  mat3 normalMatrix = mat3(modelMatrix);
 
  vec4 position = uMVMatrix * modelMatrix * vec4(aVertexPosition, 1.0);
  vPosition = position.xyz;
  vNormal = uNMatrix * normalMatrix * aVertexNormal;
  vTextureCoordinate = aTextureCoordinate;
 
  vAmbient = uColorAmbientFactor * aVertexColor.rgb;
  vDiffuse = uColorDiffuseFactor * aVertexColor.rgb;
  vSpecular = uSpecular;
  vOpacity = uApplyTranslucentAlphaToAll ? (min(uTranslucentPassAlpha, aVertexColor.a)) : aVertexColor.a;
 
  gl_Position = uPMatrix * position;
}

它们似乎将对象位置和旋转角度编码为4分量浮点纹理中的2个条目,添加了将每个顶点变换的位置存储在此纹理中的属性,然后在顶点着色器中执行矩阵计算.

It looks like they encode object position and rotation angles as 2 entries in 4-component float texture, add attribute that stores position of each vertex transform in this texture and then perform matrix computation in vertex shader.

所以问题是此着色器实际上是解决我的问题的有效解决方案,还是我应该更好地使用批处理或其他方式?

So the question is this shader actually effective solution for my problem or I should better use batching or something else?

PS:也许更好的方法是存储四元数而不是角度并通过它直接转换顶点?

PS: May be even better approach is to store quaternion instead of angles and transform vertices by it directly?

推荐答案

我对此也很好奇,所以我用4种不同的绘画技术进行了一些测试.

I was curious about this too so I run a couple tests with 4 different drawing techniques.

首先是通过在大多数教程和书籍中找到的制服进行实例化.对于每个模型,设置制服,然后绘制模型.

The first is instancing via uniform that you found in most tutorials and books. For each model, set uniforms, then draw model.

第二个是存储附加属性,即在每个顶点上进行矩阵变换,并在GPU上进行变换.在每次绘制中,执行gl.bufferSubData,然后在每次绘制中绘制尽可能多的模型.

The second is to store an additional attribute, the matrix transform on each vertex and do transforms on the GPU. On each draw, do gl.bufferSubData then draw as many models as possible in each draw.

第三种方法是将多个矩阵变换均匀地上传到GPU,并在每个顶点上具有一个附加的matrixID,以在GPU上选择正确的矩阵.这与第一个相似,不同之处在于它允许批量绘制模型.这也是通常在骨骼动画中实现的方式.在绘制时间上,对于每个批次,将模型从batch [index]处的矩阵上载到GPU中的矩阵array [index],然后绘制该批次.

The third approach is to upload multiple matrix transforms as uniform to the GPU and have an additional matrixID on each vertex to select the right matrix on the GPU. This is similar to the first, except it allows models to be drawn in batches. This is also how it is usually implemented in skeleton animations. On draw time, for each batch, upload matrix from the model at batch[index] to the matrix array[index] in the GPU and draw the batch.

最后的技术是通过纹理查找.我创建了一个Float32Array,大小为4096 * 256 * 4,其中包含每个模型的世界矩阵(足以容纳约256k个模型).每个模型都有一个modelIndex属性,该属性用于从纹理读取其矩阵.然后在每个帧处,使用gl.texSubImage2D绘制整个纹理,并在每个绘制调用中绘制尽可能多的图像.

The final technique is via texture lookup. I created a Float32Array of size 4096 * 256 * 4 which contains the world matrix for every model (enough for ~256k models). Each model has a modelIndex attribute which is used to read its matrix from the texture. Then at each frame, gl.texSubImage2D the entire texture and draw as many as possible in each draw call.

不考虑使用硬件实例化,因为我假设要求绘制许多独特的模型,尽管对于我的测试,我只绘制每帧具有不同世界矩阵的多维数据集.

Hardware instancing is not considered as I assume the requirement is to draw many unique models even though for my test I am only drawing cubes that have a different world matrix every frame.

以下是结果:(以60FPS可以绘制多少个)

Here is the results: (how many can be drawn at 60FPS)

  1. 每个型号的制服不同:〜2000
  2. matrixId为〜20000的批次制服
  3. 每个顶点存储转换:〜40000(发现第一个实现的错误)
  4. 纹理查询:〜160000
  5. 无需绘制,只需CPU时间即可计算矩阵:〜170000

我认为显而易见的是,统一实例化并非一路顺风.技术1失败的原因仅在于它进行了太多绘制调用.批处理的制服应该可以处理抽签问题,但是我发现从正确的模型中获取矩阵数据并将其上传到GPU上花费了过多的CPU时间.大量的uniformMatrix4f调用也没有帮助.

I think its pretty obvious that uniform instancing is not the way the go. Technique 1 fails just because its doing too many draw calls. Batched uniforms should supposedly handle the draw call problem but I found too much CPU time is used on getting the matrix data from the right model and uploading it to GPU. The numerous uniformMatrix4f calls didnt help either.

与为动态对象计算新世界矩阵所需的时间相比,gl.texSubImage2D所需的时间明显更少.在每个顶点上复制变换数据比大多数人认为的效果更好,但是这浪费了大量的内存带宽.在上述所有技术中,纹理查找方法可能对CPU最友好.进行4个纹理查找的速度似乎与进行统一数组查找相似. (来自对我受GPU约束的较大的复杂对象进行测试的结果).

The time it takes to do gl.texSubImage2D is significantly less compared to the time it takes to calculate new world matrices for dynamic objects. Duplicating transform data on each vertex works better than what most people might think but it is wasting a lot of memory bandwidth. Texture lookup approach is probably most friendly to CPU out of all above techniques. The speed of doing 4 texture lookup seems to be similar to doing a uniform array lookup. (results from testing with larger complex objects in which I am GPU bound).

这是使用纹理查找方法的其中一项测试的快照:

Here is a snapshot from one of the tests using the texture lookup approach:

因此,总而言之,如果模型较小,那么您可能要在每个顶点上存储转换数据,或者在模型较大时使用纹理查找方法.

So, in conclusion, what you are after is probably either store the transformation data on each vertex if your models are small or use the texture lookup approach when your models are big.

回答评论中的问题

  1. 填充率:我完全不受GPU束缚.当我尝试使用大型复杂模型时,统一实例化实际上变得最快.我想使用统一的批处理和纹理查找会导致一些GPU开销,这会导致它们变慢.
  2. 存储四元数和翻译:在我的情况下,这没什么大不了的,因为如您所见,texSubImage2D仅占用了9%的CPU时间,将其减少到4.5%没什么大不了的.很难说它对GPU的影响,因为虽然您执行较少的纹理查找,但是您必须将四元数和转换转换为矩阵.
  3. 交织:如果您的应用是顶点绑定的,那么此技术可以使速度提高5-10%.但是,我从未见过交错对我的测试有影响.所以我完全摆脱了它.
  4. 内存:除了在每个顶点上重复之外,所有技术基本上都相同.所有其他3种技术都应将相同数量的数据传递给GPU. (您可以统一输入翻译+四元数而不是矩阵)

这篇关于OpenGL ES(WebGL)渲染许多小对象的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆