在Vulkan中并行执行计算着色器? [英] Parallel compute shaders execution in Vulkan?

查看:132
本文介绍了在Vulkan中并行执行计算着色器?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有几个计算着色器(我们称它们为compute1compute2等),它们具有多个输入绑定(在着色器代码中定义为layout (...) readonly buffer)和多个输出绑定(定义为layout (...) writeonly buffer).我将带有数据的缓冲区绑定到它们的描述符集,然后尝试并行执行这些着色器 .

I have several compute shaders (let's call them compute1, compute2 and so on) that have several input bindings (defined in shader code as layout (...) readonly buffer) and several output bindings (defined as layout (...) writeonly buffer). I'm binding buffers with data to their descriptor sets and then trying to execute these shaders in parallel.

我尝试过的事情:

  1. vkQueueSubmit(),其中VkSubmitInfo.pCommandBuffers保存几个主要命令缓冲区(每个计算着色器一个);
  2. vkQueueSubmit(),其中VkSubmitInfo.pCommandBuffers保存一个主要命令缓冲区,该记录使用vkCmdExecuteCommands()记录,而pCommandBuffers保存多个次要命令缓冲区(每个计算着色器一个);
  3. vkQueueSubmit() + vkQueueWaitIdle()与不同的std::thread对象分开(每个计算着色器一个)-每个命令缓冲区都分配在单独的VkCommandPool中,并通过自己的VkFence提交给自己的VkQueue,主线程正在使用threads[0].join(); threads[1].join();等;
  4. vkQueueSubmit()与不同的分离的 std::thread对象分开(每个计算着色器一个)-每个命令缓冲区都分配在单独的VkCommandPool中,并提交给具有自己的VkQueue和自己的VkFence >,使用vkWaitForFences()pFences保留使用vkQueueSubmit()的围栏并使用waitAll保持true的围栏正在等待主线程.
  1. vkQueueSubmit() with VkSubmitInfo.pCommandBuffers holding several primary command buffers (one per compute shader);
  2. vkQueueSubmit() with VkSubmitInfo.pCommandBuffers holding one primary command buffer that was recorded using vkCmdExecuteCommands() with pCommandBuffers holding several secondary command buffers (one per compute shader);
  3. Separate vkQueueSubmit()+vkQueueWaitIdle() from different std::thread objects (one per compute shader) - each command buffer is allocated in separate VkCommandPool and is submitting to own VkQueue with own VkFence, main thread is waiting using threads[0].join(); threads[1].join(); and so on;
  4. Separate vkQueueSubmit() from different detached std::thread objects (one per compute shader) - each command buffer is allocated in separate VkCommandPool and is submitting to own VkQueue with own VkFence, main thread is waiting using vkWaitForFences() with pFences holding fences that where used in vkQueueSubmit() and with waitAll holding true.

我所拥有的:

在所有情况下,结果时间几乎都是相同的(相差小于1%),就像为compute1调用vkQueueSubmit() + vkQueueWaitIdle()然后为compute2调用以此类推.

In all cases result time is almost the same (difference is less then 1%) as if calling vkQueueSubmit()+vkQueueWaitIdle() for compute1, then for compute2 and so on.

我想为几个着色器的输入绑定相同的缓冲区,但是根据时间,如果每个着色器都使用自己的VkBuffer + VkDeviceMemory对象执行,结果是相同的.

I want to bind the same buffers as inputs for several shaders, but according to time the result is the same if each shader is executed with own VkBuffer+VkDeviceMemory objects.

我的问题是:

是否可以以某种方式同时执行多个计算着色器,或者命令缓冲区并行性仅适用于图形着色器?

Is is possible to somehow execute several compute shaders simultaneously, or command buffer parallelism works for graphical shaders only?

更新:测试应用程序是使用LunarG Vulkan SDK 1.1.73.0编译的,并在Windows 10和NVIDIA GeForce GTX 960上运行.

Update: Test application was compiled using LunarG Vulkan SDK 1.1.73.0 and running on Windows 10 with NVIDIA GeForce GTX 960.

推荐答案

这取决于要在其上执行应用程序的硬件.硬件导出队列处理提交的命令.顾名思义,每个队列依次执行命令.因此,如果将多个命令缓冲区提交到单个队列,则将按提交顺序执行它们.在内部,GPU可以尝试并行执行所提交命令的某些部分(例如可以同时处理图形流水线的各个部分).但是通常,单个队列按顺序处理命令,无论您提交图形还是计算命令都没有关系.

This depends on the hardware You are executing Your application on. Hardware exports queues which process submitted commands. Each queue, as name suggests, executes command in order, one after another. So if You submit multiple command buffers to a single queue, they will be executed in order of their submission. Internally, GPU can try to parallelize execution of some parts of the submitted commands (like separate parts of graphics pipeline can be processed at the same time). But in general, single queue processes commands sequentially and it doesn't matter if You are submitting graphics or compute commands.

为了并行执行多个命令缓冲区,您需要将它们提交到单独的队列中.但是硬件必须支持多个队列-它必须具有单独的物理队列,以便能够同时处理它们.

In order to execute multiple command buffers in parallel, You need to submit them to separate queues. But hardware must support multiple queues - it must have separate, physical queues in order to be able to process them concurrently.

但是,更重要的是-我读过一些图形硬件供应商通过图形驱动程序模拟多个队列.换句话说-它们在Vulkan中公开了多个队列,但是在内部它们是由一个物理队列处理的,我认为您的问题就是这种情况,您的实验结果可以证实这一点(尽管我不确定,当然).

But, what's more important - I've read that some graphics hardware vendors simulate multiple queues through graphics drivers. In other words - they expose multiple queues in Vulkan, but internally they are processed by a single physical queue and I think that's the case with Your issue here, results of Your experiments would confirm this (though I can't be sure, of course).

这篇关于在Vulkan中并行执行计算着色器?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆