CUDA:cudaEvent_t 和 cudaThreadSynchronize 用法 [英] CUDA: cudaEvent_t and cudaThreadSynchronize usage

查看:38
本文介绍了CUDA:cudaEvent_t 和 cudaThreadSynchronize 用法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对 cudaEvent_t 的用法有点困惑.目前,我正在使用这样的 clock() 调用来查找内核调用的持续时间:

I am a bit confused about the usage of cudaEvent_t. Currently, I am using the clock() call like this to find the duration of a kernel call:

cudaThreadSynchronize();
clock_t begin = clock();

fooKernel<<< x, y >>>( z, w );

cudaThreadSynchronize();
clock_t end = clock();

// Print time difference: ( end - begin )

寻找更高分辨率的计时器我正在考虑使用 cudaEvent_t.在使用 cudaEventRecord() 记下时间之前,我是否需要调用 cudaThreadSynchronize() 还是多余的?

Looking for a timer of higher-resolution I am considering using cudaEvent_t. Do I need to call cudaThreadSynchronize() before I note down the time using cudaEventRecord() or is it redundant?

我问的原因是因为还有另一个调用cudaEventSynchronize(),它似乎要等到事件被记录下来.如果录制延迟,计算出来的时间差是不是会在内核执行完后显示一些额外的时间?

The reason I am asking is because there is another call cudaEventSynchronize(), which seems to wait until the event is recorded. If the recording is delayed, won't the time difference that is calculated show some extra time after the kernel has finished execution?

推荐答案

其实还有更多的同步函数(cudaStreamSynchronize).编程指南详细描述了每一个的作用.使用事件作为计时器基本上可以归结为:

Actually there are even more synchronization functions (cudaStreamSynchronize). The programming guide has a detailed description what every one of those does. Using events as timers basically comes down to this:

//create events
cudaEvent_t event1, event2;
cudaEventCreate(&event1);
cudaEventCreate(&event2);

//record events around kernel launch
cudaEventRecord(event1, 0); //where 0 is the default stream
kernel<<<grid,block>>>(...); //also using the default stream
cudaEventRecord(event2, 0);

//synchronize
cudaEventSynchronize(event1); //optional
cudaEventSynchronize(event2); //wait for the event to be executed!

//calculate time
float dt_ms;
cudaEventElapsedTime(&dt_ms, event1, event2);

event2 上进行同步很重要,因为您希望在计算时间之前确保所有内容都已执行.由于事件和内核都在同一个流上(保留顺序)event1kernel 也被执行.

It's important to synchronize on event2 because you want to make sure everything got executed before calculating the time. As both events and the kernel are on the same stream (order is preserved) event1 and kernel got executed too.

您可以改为调用 cudaStreamSynchronize 甚至 cudaThreadSynchronize,但在这种情况下两者都过大了.

You could call cudaStreamSynchronize or even cudaThreadSynchronize instead but both are overkill in this case.

这篇关于CUDA:cudaEvent_t 和 cudaThreadSynchronize 用法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆