定时策略CUDA内核：优点和缺点？ [英] Strategies for timing CUDA Kernels: Pros and Cons?

查看：250 发布时间：2017/3/4 12:25:12 cuda gpgpu nvidia code-timing

本文介绍了定时策略CUDA内核：优点和缺点？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

当计算CUDA内核时，以下内容不工作，因为内核在执行时不会阻止CPU程序执行：

When timing CUDA kernels, the following doesn't work because the kernel doesn't block the CPU program execution while it executes:

start timer
kernel<<<g,b>>>();
end timer

（成功）定时CUDA内核的三种基本方法：

I've seen three basic ways of (successfully) timing CUDA kernels:

（1）两个CUDA eventRecords。

(1) Two CUDA eventRecords.

float responseTime; //result will be in milliseconds
cudaEvent_t start; cudaEventCreate(&start); cudaEventRecord(start); cudaEventSynchronize(start);
cudaEvent_t stop;  cudaEventCreate(&stop);
kernel<<<g,b>>>();
cudaEventRecord(stop); cudaEventSynchronize(stop);
cudaEventElapsedTime(&responseTime, start, stop); //responseTime = elapsed time

（2）一个CUDA eventRecord。

(2) One CUDA eventRecord.

float start = read_timer(); //helper function on CPU, in milliseconds
cudaEvent_t stop;  cudaEventCreate(&stop);
kernel<<<g,b>>>();
cudaEventRecord(stop); cudaEventSynchronize(stop);
float responseTime = read_timer() - start;

（3）deviceSynchronize而不是eventRecord。（可能只在单个流中使用编程时有用。）

(3) deviceSynchronize instead of eventRecord. (Probably only useful when using programming in a single stream.)

float start = read_timer(); //helper function on CPU, in milliseconds
kernel<<<g,b>>>();
cudaDeviceSynchronize();
float responseTime = read_timer() - start;

我通过实验验证了这三个策略产生的时间结果相同。

I experimentally verified that these three strategies produce the same timing result.

问题：

这些策略？

除了定时多个流中的许多内核之外，使用两个事件记录和 cudaEventElapsedTime （）功能？

What are the tradeoffs of these strategies? Any hidden details here?
Aside from timing many kernels in multiple streams, is there any advantages of using two event records and the cudaEventElapsedTime() function?

可以使用你的想象力来弄清楚 read_timer（）是什么。然而，提供一个示例实现不会有什么困难：

You can probably use your imagination to figure out what read_timer() does. Nevertheless, it can't hurt to provide an example implementation:

double read_timer(){
    struct timeval start;
    gettimeofday( &start, NULL ); //you need to include <sys/time.h>
    return (double)((start.tv_sec) + 1.0e-6 * (start.tv_usec))*1000; //milliseconds
}

定时策略CUDA内核：优点和缺点？ [英] Strategies for timing CUDA Kernels: Pros and Cons?

问题描述

推荐答案

相关文章

其它硬件开发最新文章

热门教程

热门工具

登录关闭

定时策略CUDA内核：优点和缺点？ [英] Strategies for timing CUDA Kernels: Pros and Cons?

问题描述

推荐答案

相关文章

其它硬件开发最新文章

热门教程

热门工具

登录 关闭

登录关闭