金属渲染真的很慢-如何加快渲染速度 [英] Metal rendering really slow - how to speed it up

查看:250
本文介绍了金属渲染真的很慢-如何加快渲染速度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个正在运行的金属应用程序,该应用程序非常慢,并且需要运行得更快.我相信问题是我创建了太多的MTLCommandBuffer对象.

I have a working metal application that is extremely slow, and needs to run faster. I believe the problem is I am creating too many MTLCommandBuffer objects.

我创建这么多MTLCommandBuffer对象的原因是我需要向像素着色器发送不同的统一值.我粘贴了一段代码来说明下面的问题.

The reason I am creating so many MTLCommandBuffer objects is I need to send different uniform values to the pixel shader. I've pasted a snippit of code to illustrate the problem below.

  for (int obj_i = 0 ; obj_i < n ; ++obj_i)
  {
     // I create one render command buffer per object I draw so I can use  different uniforms
     id <MTLCommandBuffer> mtlCommandBuffer = [metal_info.g_commandQueue commandBuffer];
     id <MTLRenderCommandEncoder> renderCommand = [mtlCommandBuffer renderCommandEncoderWithDescriptor:<#(MTLRenderPassDescriptor *)#>]

     // glossing over details, but this call has per object specific data
     memcpy([global_uniform_buffer contents], per_object_data, sizeof(per_data_object));

     [renderCommand setVertexBuffer:object_vertices  offset:0 atIndex:0];
     // I am reusing a single buffer for all shader calls
     // this is killing performance
     [renderCommand setVertexBuffer:global_uniform_buffer offset:0 atIndex:1];

     [renderCommand drawIndexedPrimitives:MTLPrimitiveTypeTriangle
                               indexCount:per_object_index_count
                               indexType:MTLIndexTypeUInt32
                             indexBuffer:indicies
                       indexBufferOffset:0];
     [renderCommand endEncoding];
     [mtlCommandBuffer presentDrawable:frameDrawable];
     [mtlCommandBuffer commit];
}  

上面的代码按预期绘制,但是速度非常慢.我正在猜测,因为有一种比对每个对象创建MTLCommandBuffer更好的方法来强制进行像素着色器评估.

The above code draw as expected, but is EXTREMELY slow. I'm guessing because there is a better way to force pixel shader evaluation than creating a MTLCommandBuffer per object.

我考虑过简单地分配一个比单个着色器遍历所需的缓冲区大得多的缓冲区,并简单地使用offset在一个渲染命令编码器中将多个调用排队,然后执行它们.这种方法似乎很不合常规,我想确保解决了以Metal友好的方式为每个对象发送自定义数据所需的问题.

I've consider simple allocating a buffer much larger than is needed for a single shader pass and simply using offset to queue up several calls in one render command encoder then execute them. This method seems pretty unorthodox, and I want to make sure I'm solving the issue of needed to send custom data per object in a Metal friendly way.

使用每次调用自定义统一数据多次使用同一像素/顶点着色器进行多次渲染的最快方法是什么?

推荐答案

不要为每个对象重复使用相同的统一缓冲区.这样做会破坏CPU和GPU之间的所有并行性,并导致定期的同步点.

Don't reuse the same uniform buffer for every object. Doing that destroys all parallelism between the CPU and GPU and causes regular sync points.

相反,为要在帧中渲染的每个对象创建单独的统一缓冲区.实际上,您实际上应该为每个对象创建2个对象,并在每个帧之间交替,以便GPU在准备CPU的下一帧时可以渲染最后一帧.

Instead, make a separate uniform buffer for each object you are going to render in the frame. In fact you should really create 2 per object and alternate between them each frame so that the GPU can be rendering the last frame whilst you are preparing the next frame on the CPU.

执行完此操作后,您只需重构循环,即可每帧执行一次命令缓冲区和渲染命令工作.您的循环仅应包括复制统一数据,设置顶点缓冲区和调用绘制基元.

After you do that, you simply refactor your loop so the command buffer and render command work are done once per frame. Your loop should only consist of copying the uniform data, setting the vertex buffer and calling draw primitive.

这篇关于金属渲染真的很慢-如何加快渲染速度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆