在Vulkan中并行执行计算着色器? [英] Parallel compute shaders execution in Vulkan?
问题描述
我有几个计算着色器(我们称它们为compute1
,compute2
等),它们具有多个输入绑定(在着色器代码中定义为layout (...) readonly buffer
)和多个输出绑定(定义为layout (...) writeonly buffer
).我将带有数据的缓冲区绑定到它们的描述符集,然后尝试并行执行这些着色器 .
I have several compute shaders (let's call them compute1
, compute2
and so on) that have several input bindings (defined in shader code as layout (...) readonly buffer
) and several output bindings (defined as layout (...) writeonly buffer
). I'm binding buffers with data to their descriptor sets and then trying to execute these shaders in parallel.
我尝试过的事情:
-
vkQueueSubmit()
,其中VkSubmitInfo.pCommandBuffers
保存几个主要命令缓冲区(每个计算着色器一个); -
vkQueueSubmit()
,其中VkSubmitInfo.pCommandBuffers
保存一个主要命令缓冲区,该记录使用vkCmdExecuteCommands()
记录,而pCommandBuffers
保存多个次要命令缓冲区(每个计算着色器一个); - 将
vkQueueSubmit()
+vkQueueWaitIdle()
与不同的std::thread
对象分开(每个计算着色器一个)-每个命令缓冲区都分配在单独的VkCommandPool
中,并通过自己的VkFence
提交给自己的VkQueue
,主线程正在使用threads[0].join(); threads[1].join();
等; - 将
vkQueueSubmit()
与不同的分离的std::thread
对象分开(每个计算着色器一个)-每个命令缓冲区都分配在单独的VkCommandPool
中,并提交给具有自己的VkQueue
和自己的VkFence
>,使用vkWaitForFences()
且pFences
保留使用vkQueueSubmit()
的围栏并使用waitAll
保持true
的围栏正在等待主线程.
vkQueueSubmit()
withVkSubmitInfo.pCommandBuffers
holding several primary command buffers (one per compute shader);vkQueueSubmit()
withVkSubmitInfo.pCommandBuffers
holding one primary command buffer that was recorded usingvkCmdExecuteCommands()
withpCommandBuffers
holding several secondary command buffers (one per compute shader);- Separate
vkQueueSubmit()
+vkQueueWaitIdle()
from differentstd::thread
objects (one per compute shader) - each command buffer is allocated in separateVkCommandPool
and is submitting to ownVkQueue
with ownVkFence
, main thread is waiting usingthreads[0].join(); threads[1].join();
and so on; - Separate
vkQueueSubmit()
from different detachedstd::thread
objects (one per compute shader) - each command buffer is allocated in separateVkCommandPool
and is submitting to ownVkQueue
with ownVkFence
, main thread is waiting usingvkWaitForFences()
withpFences
holding fences that where used invkQueueSubmit()
and withwaitAll
holdingtrue
.
我所拥有的:
在所有情况下,结果时间几乎都是相同的(相差小于1%),就像为compute1
调用vkQueueSubmit()
+ vkQueueWaitIdle()
然后为compute2
调用以此类推.
In all cases result time is almost the same (difference is less then 1%) as if calling vkQueueSubmit()
+vkQueueWaitIdle()
for compute1
, then for compute2
and so on.
我想为几个着色器的输入绑定相同的缓冲区,但是根据时间,如果每个着色器都使用自己的VkBuffer
+ VkDeviceMemory
对象执行,结果是相同的.
I want to bind the same buffers as inputs for several shaders, but according to time the result is the same if each shader is executed with own VkBuffer
+VkDeviceMemory
objects.
我的问题是:
是否可以以某种方式同时执行多个计算着色器,或者命令缓冲区并行性仅适用于图形着色器?
Is is possible to somehow execute several compute shaders simultaneously, or command buffer parallelism works for graphical shaders only?
更新:测试应用程序是使用LunarG Vulkan SDK 1.1.73.0编译的,并在Windows 10和NVIDIA GeForce GTX 960上运行.
Update: Test application was compiled using LunarG Vulkan SDK 1.1.73.0 and running on Windows 10 with NVIDIA GeForce GTX 960.
推荐答案
这取决于要在其上执行应用程序的硬件.硬件导出队列处理提交的命令.顾名思义,每个队列依次执行命令.因此,如果将多个命令缓冲区提交到单个队列,则将按提交顺序执行它们.在内部,GPU可以尝试并行执行所提交命令的某些部分(例如可以同时处理图形流水线的各个部分).但是通常,单个队列按顺序处理命令,无论您提交图形还是计算命令都没有关系.
This depends on the hardware You are executing Your application on. Hardware exports queues which process submitted commands. Each queue, as name suggests, executes command in order, one after another. So if You submit multiple command buffers to a single queue, they will be executed in order of their submission. Internally, GPU can try to parallelize execution of some parts of the submitted commands (like separate parts of graphics pipeline can be processed at the same time). But in general, single queue processes commands sequentially and it doesn't matter if You are submitting graphics or compute commands.
为了并行执行多个命令缓冲区,您需要将它们提交到单独的队列中.但是硬件必须支持多个队列-它必须具有单独的物理队列,以便能够同时处理它们.
In order to execute multiple command buffers in parallel, You need to submit them to separate queues. But hardware must support multiple queues - it must have separate, physical queues in order to be able to process them concurrently.
但是,更重要的是-我读过一些图形硬件供应商通过图形驱动程序模拟多个队列.换句话说-它们在Vulkan中公开了多个队列,但是在内部它们是由一个物理队列处理的,我认为您的问题就是这种情况,您的实验结果可以证实这一点(尽管我不确定,当然).
But, what's more important - I've read that some graphics hardware vendors simulate multiple queues through graphics drivers. In other words - they expose multiple queues in Vulkan, but internally they are processed by a single physical queue and I think that's the case with Your issue here, results of Your experiments would confirm this (though I can't be sure, of course).
这篇关于在Vulkan中并行执行计算着色器?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!