CUDA 内核计时策略:优点和缺点? [英] Strategies for timing CUDA Kernels: Pros and Cons?

查看：27 发布时间：2022/1/10 16:03:01 cuda gpgpu nvidia benchmarking

本文介绍了CUDA 内核计时策略:优点和缺点?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在为 CUDA 内核计时时，以下内容不起作用，因为内核在执行时不会阻塞 CPU 程序执行:

When timing CUDA kernels, the following doesn't work because the kernel doesn't block the CPU program execution while it executes:

start timer
kernel<<<g,b>>>();
end timer

<小时>

我已经看到了三种(成功地)为 CUDA 内核计时的基本方法:

I've seen three basic ways of (successfully) timing CUDA kernels:

(1) 两个 CUDA eventRecords.

(1) Two CUDA eventRecords.

float responseTime; //result will be in milliseconds
cudaEvent_t start; cudaEventCreate(&start); cudaEventRecord(start); cudaEventSynchronize(start);
cudaEvent_t stop;  cudaEventCreate(&stop);
kernel<<<g,b>>>();
cudaEventRecord(stop); cudaEventSynchronize(stop);
cudaEventElapsedTime(&responseTime, start, stop); //responseTime = elapsed time

(2) 一个 CUDA eventRecord.

(2) One CUDA eventRecord.

float start = read_timer(); //helper function on CPU, in milliseconds
cudaEvent_t stop;  cudaEventCreate(&stop);
kernel<<<g,b>>>();
cudaEventRecord(stop); cudaEventSynchronize(stop);
float responseTime = read_timer() - start;

(3) deviceSynchronize 而不是 eventRecord.(可能仅在在单个流中使用编程时有用.)

(3) deviceSynchronize instead of eventRecord. (Probably only useful when using programming in a single stream.)

float start = read_timer(); //helper function on CPU, in milliseconds
kernel<<<g,b>>>();
cudaDeviceSynchronize();
float responseTime = read_timer() - start;

我通过实验验证了这三种策略产生相同的计时结果.

I experimentally verified that these three strategies produce the same timing result.

问题:

这些策略的权衡是什么?这里有任何隐藏的细节吗?
除了在多个流中计时多个内核之外，使用两个事件记录和 cudaEventElapsedTime() 函数有什么好处吗?

您可能可以发挥想象力来弄清楚 read_timer() 做了什么.不过，提供一个示例实现也无妨:

You can probably use your imagination to figure out what read_timer() does. Nevertheless, it can't hurt to provide an example implementation:

double read_timer(){
    struct timeval start;
    gettimeofday( &start, NULL ); //you need to include <sys/time.h>
    return (double)((start.tv_sec) + 1.0e-6 * (start.tv_usec))*1000; //milliseconds
}

CUDA 内核计时策略:优点和缺点? [英] Strategies for timing CUDA Kernels: Pros and Cons?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

CUDA 内核计时策略:优点和缺点? [英] Strategies for timing CUDA Kernels: Pros and Cons?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭