CUDA:cudaEvent_t和cudaThreadSynchronize的用法 [英] CUDA: cudaEvent_t and cudaThreadSynchronize usage

查看:1993
本文介绍了CUDA:cudaEvent_t和cudaThreadSynchronize的用法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对使用 cudaEvent_t 有点困惑。目前,我使用 clock()调用来查找内核调用的持续时间:

  cudaThreadSynchronize(); 
clock_t begin = clock();

fooKernel<< x,y>(z,w);

cudaThreadSynchronize();
clock_t end = clock();

//打印时差:(结束 - 开始)

一个更高分辨率的计时器我正在考虑使用 cudaEvent_t 。在使用 cudaEventRecord()记下时间之前,是否需要调用 cudaThreadSynchronize() / p>

我问的原因是因为有另一个调用 cudaEventSynchronize(),这似乎等待直到事件记录。如果记录被延迟,计算的时间差是否会在内核完成执行后显示一些额外的时间?

解决方案

实际上还有更多的同步函数( cudaStreamSynchronize )。编程指南有详细的描述,每个人都做。使用事件作为计时器基本上归结为:

  //创建事件
cudaEvent_t event1,event2;
cudaEventCreate(& event1);
cudaEventCreate(& event2);

//在内核启动时记录事件
cudaEventRecord(event1,0); //其中0是默认流
kernel<<<< grid,block>>>(...) //也使用默认流
cudaEventRecord(event2,0);

//同步
cudaEventSynchronize(event1); // optional
cudaEventSynchronize(event2); //等待事件执行!

//计算时间
float dt_ms;
cudaEventElapsedTime(& dt_ms,event1,event2);

event2 上进行同步很重要,想要确保一切在计算时间之前执行。由于事件和内核都在同一个流上(保持顺序) event1 内核 / p>

您可以调用 cudaStreamSynchronize 或甚至 cudaThreadSynchronize 在这种情况下是过度杀伤。


I am a bit confused about the usage of cudaEvent_t. Currently, I am using the clock() call like this to find the duration of a kernel call:

cudaThreadSynchronize();
clock_t begin = clock();

fooKernel<<< x, y >>>( z, w );

cudaThreadSynchronize();
clock_t end = clock();

// Print time difference: ( end - begin )

Looking for a timer of higher-resolution I am considering using cudaEvent_t. Do I need to call cudaThreadSynchronize() before I note down the time using cudaEventRecord() or is it redundant?

The reason I am asking is because there is another call cudaEventSynchronize(), which seems to wait until the event is recorded. If the recording is delayed, won't the time difference that is calculated show some extra time after the kernel has finished execution?

解决方案

Actually there are even more synchronization functions (cudaStreamSynchronize). The programming guide has a detailed description what every one of those does. Using events as timers basically comes down to this:

//create events
cudaEvent_t event1, event2;
cudaEventCreate(&event1);
cudaEventCreate(&event2);

//record events around kernel launch
cudaEventRecord(event1, 0); //where 0 is the default stream
kernel<<<grid,block>>>(...); //also using the default stream
cudaEventRecord(event2, 0);

//synchronize
cudaEventSynchronize(event1); //optional
cudaEventSynchronize(event2); //wait for the event to be executed!

//calculate time
float dt_ms;
cudaEventElapsedTime(&dt_ms, event1, event2);

It's important to synchronize on event2 because you want to make sure everything got executed before calculating the time. As both events and the kernel are on the same stream (order is preserved) event1 and kernel got executed too.

You could call cudaStreamSynchronize or even cudaThreadSynchronize instead but both are overkill in this case.

这篇关于CUDA:cudaEvent_t和cudaThreadSynchronize的用法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆