如何使用 Vulkan 时间戳查询? [英] How to use Vulkan Timestamp Queries?

查看:71
本文介绍了如何使用 Vulkan 时间戳查询?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我尝试测量 GPU 工作负载的简化伪代码:

This is the simplified pseudocode where I'm trying to measure a GPU workload:

for(N) vkCmdDrawIndexed();

vkCmdWriteTimestamp(VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT);
vkCmdWriteTimestamp(VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT);

submit();
vkDeviceWaitIdle();
vkGetQueryPoolResults();

注意事项:

  • 在我的例子中,N 是 224
  • 我必须等待空闲设备 - 没有它,我继续收到验证错误,提示我数据尚未准备好,尽管我有多个查询池正在运行
  • 放置第一个时间戳我希望在所有先前的命令到达预处理步骤后立即写入查询值.我很确定所有 224 个命令几乎同时被预处理,但现实表明这不是真的.
  • 放入第二个时间戳我希望查询值将在所有先前命令完成后写入.IE.这两个查询值之间的时间差应该给我 GPU 为单个帧完成所有工作所需的时间.
  • 我正在考虑 VkPhysicalDeviceLimits::timestampPeriod(在我的机器上为 1)和 VkQueueFamilyProperties::timestampValidBits(在我的机器上为 64)
  • N is 224 in my case
  • I have to wait for an idle device - without it, I continue to receive a validation error saying me that the data is not ready though I have multiple query pools in flight
  • putting the first timestamp I expect that the query value will be written as soon as all previous commands reached a preprocessing step. I was pretty sure that all 224 commands are preprocessed almost at the same time but the reality shows that this is not true.
  • putting the second timestamp I expect that the query value will be written after all previous commands are finished. I.e. the time difference between these two query values should give me the time it takes for the GPU to do all the work for a single frame.
  • I'm taking into account VkPhysicalDeviceLimits::timestampPeriod (1 on my machine) and VkQueueFamilyProperties::timestampValidBits (64 on my machine)

我创建了一个大型数据集,在视觉上需要大约 2 秒(~2000 毫秒)来渲染单个帧.但是计算出的时间只有 2(两个)不同的值 - 0.001024ms 或 0.002048ms,因此逐帧输出可能如下所示:

I created a big dataset that visually takes approx 2 seconds (~2000ms) to render a single frame. But the calculated time has only 2 (two) different values - either 0.001024ms or 0.002048ms so the frame by frame output can look like this:

0.001024ms
0.001024ms
0.002048ms
0.001024ms
0.002048ms
0.002048ms
...

不知道你怎么样,但我觉得这些价值观非常可疑.我对此没有答案.可能到时候,最后一个绘制命令到达命令处理器,所有的工作都已经完成了,但为什么是 1024 和 2048??

Don't know how about you, but I find these values VERY suspicious. I have no answer for that. Maybe at the time, the last draw command reaches the command processor all the work is already done, but why the hell 1024 and 2048??

我尝试修改代码并移动上面的第一个时间戳,即:

I tried to modify the code and move the first timestamp above, i.e.:

vkCmdWriteTimestamp(VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT);

for(N) vkCmdDrawIndexed();
    
vkCmdWriteTimestamp(VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT);

现在在预处理器命中时间戳命令时,它立即写入查询值,因为没有以前的工作,也没有什么可等待的(记住空闲设备).这次我有另一个,更接近真值:

Now at the time the preprocessor hits the timestamp command, it writes the query value immediately, because there was no previous work and nothing to wait (remember idle device). This time I have another, closer to the truth values:

20.9336ms
20.9736ms
21.036ms
21.0196ms
20.9572ms
21.3586ms
...

哪个更好,但仍然远远超出预期的 ~2000 毫秒.

which is better but still far beyond expected ~2000ms.

发生了什么,当我设置时间戳时设备内部发生了什么,如何获得正确的值?

What's going on, what's happening inside the device when I set timestamps, how to get correct values?

推荐答案

虽然 Vulkan 中的命令可以乱序执行(在某些限制内),但您不应该广泛期望em> 要乱序执行的命令.对于定时器查询尤其如此,如果它们被乱序执行,就其含义而言不可靠.

While commands in Vulkan can be executed out of order (within certain restrictions), you should not broadly expect commands to be executed out of order. This is especially true of timer queries which, if they were executed out of order, would be unreliable in terms of their meaning.

鉴于此,您的代码正在说,做一堆工作.然后查询管道开始准备执行新命令所需的时间,然后查询到达管道末端所需的时间."好吧,管道的开始可能只有在大部分工作完成后才准备好执行新命令.

Given that, your code is saying, "do a bunch of work. Then query the time it takes for the start of the pipe to be ready to execute new commands, then query the time it takes for the end of the pipe to be reached." Well, the start of the pipe might only be ready to execute new commands once most of the work is done.

基本上,您认为正在发生的是:

Basically, what you think is happening is this:

top        work work work work work work | timer
stage1                                    work work work work work work 
stage2                                        work work work work work work 
bottom                                            work work work work work work | timer

但是没有需要 GPU 以这种方式执行.几乎可以肯定实际发生的是:

But there's nothing that requires GPUs to execute this way. What is almost certainly actually happening is:

time->
top        work work work work work work | timer
stage1         work work work work work work 
stage2             work work work work work work 
bottom                 work work work work work work | timer

所以你的两个计时器只得到了实际工作的一小部分.

So your two timers are only getting a fraction of the actual work.

想要的是这样的:

top        timer | work work work work work work
stage1                 work work work work work work 
stage2                     work work work work work work 
bottom                         work work work work work work | timer

查询整个工作集从开始到结束的时间.

This queries the time from start to finish for the entire set of work.

因此将第一个查询放在要测量其时间的工作之前.

So put the first query before the work whose time you want to measure.

这篇关于如何使用 Vulkan 时间戳查询?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆