在PyOpenCL中测量时间 [英] Time measuring in PyOpenCL

查看:102
本文介绍了在PyOpenCL中测量时间的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在FPGA和GPU中使用PyOpenCL运行内核.为了衡量执行时间,我使用:

I am running a kernel using PyOpenCL in a FPGA and in a GPU. In order to measure the time it takes to execute I use:

t1 = time()
event = mykernel(queue, (c_width, c_height), (block_size, block_size), d_c_buf, d_a_buf, d_b_buf, a_width, b_width)
event.wait()
t2 = time()

compute_time = t2-t1
compute_time_e = (event.profile.end-event.profile.start)*1e-9 

这从主机(compute_time)和设备(compute_time_e)的角度为我提供了执行时间.问题在于此值有很大不同:

This provides me the execution time from the point of view of the host (compute_time) and from the device (compute_time_e). The problem is that this values are very different:

compute (host-timed) [s]: 0.0009386539459228516
compute (event-timed) [s]:  9.4528e-05

有人知道造成这种差异的原因是什么吗?更重要的是,哪一个更准确?

Does anyone knows what can be the reason for this differences? And more important, which one is more accurate?

谢谢.

推荐答案

这两个数字对我来说都是正确的.如果我没看错的话,主机的设备时间大约是设备时间的10倍-对于小内核来说这并不奇怪,因为它包括传输时间延迟.您的主机时间用来衡量通过PCB进行通信,而设备时间只是在衡量片上操作.

Both those numbers look right to me. If I am reading this correctly, the host is measuring about 10x the device time - which is not super strange for a small kernel because it includes transfer time latency. Your host time measures communicating through the PCB but your device time is just measuring an on-chip operation.

我认为您的节​​目时间安排如下:

I think your program timing breaks down like this:

  • Kernel Execution Time: 0.1ms // event-timed
  • Transfer Time: 0.8ms // (host-timed - event-timed)
  • Total Time: 0.9ms // host-timed
  • Kernel Execution Time: 0.1ms // event-timed
  • Transfer Time: 0.8ms // (host-timed - event-timed)
  • Total Time: 0.9ms // host-timed

如果您对此情况感到好奇,请尝试运行在设备上花费更长时间的内核.您应该开始看到,随着固定传输时间占总时间的减少,这些数字将更加紧密地匹配.

If you are curious about the situation, try running a kernel that takes much longer on the device. You should start see these numbers match up much more closely as the fixed transfer time becomes less of the overall time.

例如:

  • Kernel Execution Time: 900ms
  • Transfer Time: 0.8ms
  • Total Time: 900.8ms
  • Kernel Execution Time: 900ms
  • Transfer Time: 0.8ms
  • Total Time: 900.8ms

这篇关于在PyOpenCL中测量时间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆