CUDA:cudaEvent_t和cudaThreadSynchronize的用法 [英] CUDA: cudaEvent_t and cudaThreadSynchronize usage
问题描述
我对使用 cudaEvent_t
有点困惑。目前,我使用 clock()
调用来查找内核调用的持续时间:
cudaThreadSynchronize();
clock_t begin = clock();
fooKernel<< x,y>(z,w);
cudaThreadSynchronize();
clock_t end = clock();
//打印时差:(结束 - 开始)
一个更高分辨率的计时器我正在考虑使用 cudaEvent_t
。在使用 cudaEventRecord()
记下时间之前,是否需要调用 cudaThreadSynchronize()
/ p>
我问的原因是因为有另一个调用 cudaEventSynchronize()
,这似乎等待直到事件记录。如果记录被延迟,计算的时间差是否会在内核完成执行后显示一些额外的时间?
实际上还有更多的同步函数( cudaStreamSynchronize
)。编程指南有详细的描述,每个人都做。使用事件作为计时器基本上归结为:
//创建事件
cudaEvent_t event1,event2;
cudaEventCreate(& event1);
cudaEventCreate(& event2);
//在内核启动时记录事件
cudaEventRecord(event1,0); //其中0是默认流
kernel<<<< grid,block>>>(...) //也使用默认流
cudaEventRecord(event2,0);
//同步
cudaEventSynchronize(event1); // optional
cudaEventSynchronize(event2); //等待事件执行!
//计算时间
float dt_ms;
cudaEventElapsedTime(& dt_ms,event1,event2);
在 event2
上进行同步很重要,想要确保一切在计算时间之前执行。由于事件和内核都在同一个流上(保持顺序) event1
和内核
/ p>
您可以调用 cudaStreamSynchronize
或甚至 cudaThreadSynchronize
在这种情况下是过度杀伤。
I am a bit confused about the usage of cudaEvent_t
. Currently, I am using the clock()
call like this to find the duration of a kernel call:
cudaThreadSynchronize();
clock_t begin = clock();
fooKernel<<< x, y >>>( z, w );
cudaThreadSynchronize();
clock_t end = clock();
// Print time difference: ( end - begin )
Looking for a timer of higher-resolution I am considering using cudaEvent_t
. Do I need to call cudaThreadSynchronize()
before I note down the time using cudaEventRecord()
or is it redundant?
The reason I am asking is because there is another call cudaEventSynchronize()
, which seems to wait until the event is recorded. If the recording is delayed, won't the time difference that is calculated show some extra time after the kernel has finished execution?
Actually there are even more synchronization functions (cudaStreamSynchronize
). The programming guide has a detailed description what every one of those does. Using events as timers basically comes down to this:
//create events
cudaEvent_t event1, event2;
cudaEventCreate(&event1);
cudaEventCreate(&event2);
//record events around kernel launch
cudaEventRecord(event1, 0); //where 0 is the default stream
kernel<<<grid,block>>>(...); //also using the default stream
cudaEventRecord(event2, 0);
//synchronize
cudaEventSynchronize(event1); //optional
cudaEventSynchronize(event2); //wait for the event to be executed!
//calculate time
float dt_ms;
cudaEventElapsedTime(&dt_ms, event1, event2);
It's important to synchronize on event2
because you want to make sure everything got executed before calculating the time. As both events and the kernel are on the same stream (order is preserved) event1
and kernel
got executed too.
You could call cudaStreamSynchronize
or even cudaThreadSynchronize
instead but both are overkill in this case.
这篇关于CUDA:cudaEvent_t和cudaThreadSynchronize的用法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!