CUDA:cudaEvent_t 和 cudaThreadSynchronize 的使用 [英] CUDA: cudaEvent_t and cudaThreadSynchronize usage

查看:24
本文介绍了CUDA:cudaEvent_t 和 cudaThreadSynchronize 的使用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对 cudaEvent_t 的用法有点困惑.目前,我正在使用这样的 clock() 调用来查找内核调用的持续时间:

I am a bit confused about the usage of cudaEvent_t. Currently, I am using the clock() call like this to find the duration of a kernel call:

cudaThreadSynchronize();
clock_t begin = clock();

fooKernel<<< x, y >>>( z, w );

cudaThreadSynchronize();
clock_t end = clock();

// Print time difference: ( end - begin )

寻找更高分辨率的计时器我正在考虑使用 cudaEvent_t.在使用 cudaEventRecord() 记下时间之前,我是否需要调用 cudaThreadSynchronize() 或者它是多余的?

Looking for a timer of higher-resolution I am considering using cudaEvent_t. Do I need to call cudaThreadSynchronize() before I note down the time using cudaEventRecord() or is it redundant?

我问的原因是因为还有一个调用cudaEventSynchronize(),它似乎要等到事件被记录下来.如果记录延迟,计算的时间差会不会在内核执行完成后显示一些额外的时间?

The reason I am asking is because there is another call cudaEventSynchronize(), which seems to wait until the event is recorded. If the recording is delayed, won't the time difference that is calculated show some extra time after the kernel has finished execution?

推荐答案

其实还有更多的同步功能(cudaStreamSynchronize).编程指南详细描述了每一个的作用.使用事件作为计时器基本上归结为:

Actually there are even more synchronization functions (cudaStreamSynchronize). The programming guide has a detailed description what every one of those does. Using events as timers basically comes down to this:

//create events
cudaEvent_t event1, event2;
cudaEventCreate(&event1);
cudaEventCreate(&event2);

//record events around kernel launch
cudaEventRecord(event1, 0); //where 0 is the default stream
kernel<<<grid,block>>>(...); //also using the default stream
cudaEventRecord(event2, 0);

//synchronize
cudaEventSynchronize(event1); //optional
cudaEventSynchronize(event2); //wait for the event to be executed!

//calculate time
float dt_ms;
cudaEventElapsedTime(&dt_ms, event1, event2);

event2 上进行同步很重要,因为您希望在计算时间之前确保一切都已执行.由于事件和内核都在同一个流上(顺序被保留)event1kernel 也被执行了.

It's important to synchronize on event2 because you want to make sure everything got executed before calculating the time. As both events and the kernel are on the same stream (order is preserved) event1 and kernel got executed too.

您可以调用 cudaStreamSynchronize 或什至 cudaThreadSynchronize 来代替,但在这种情况下两者都过大了.

You could call cudaStreamSynchronize or even cudaThreadSynchronize instead but both are overkill in this case.

这篇关于CUDA:cudaEvent_t 和 cudaThreadSynchronize 的使用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆