定时CUDA运算 [英] Timing CUDA operations

查看:177
本文介绍了定时CUDA运算的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要时间CUDA内核执行。最佳实践指南说,我们可以使用事件或类似()在Windows 时钟标准计时功能。我的问题是使用这两个功能给了我一个完全不同的结果。
事实上,由事件给出的结果似乎与在实践的实际速度是巨大的。

I need to time a CUDA kernel execution. The Best Practices Guide says that we can use either events or standard timing functions like clock() in Windows. My problem is that using these two functions gives me a totally different result. In fact, the result given by events seems to be huge compared to the actual speed in practice.

我确实需要这一切的,是先在一个较小的数据集运行它的一个简化版本能够为predict一个计算的运行时间。不幸的是,这个基准测试的结果是完全不现实的,是不是太乐观(时钟())或waaaay太悲观(事件)。

What I actually need all this for is to be able to predict the running time of a computation by first running a reduced version of it on a smaller data set. Unfortunately, the results of this benchmark are totally unrealistic, being either too optimistic (clock()) or waaaay too pessimistic (events).

推荐答案

您可以做线沿线的东西:

You could do something along the lines of :

#include<sys/time.h>

struct timeval t1, t2;

gettimeofday(&t1, 0);

kernel_call<<<dimGrid, dimBlock, 0>>>();

HANDLE_ERROR(cudaThreadSynchronize();)

gettimeofday(&t2, 0);

double time = (1000000.0*(t2.tv_sec-t1.tv_sec) + t2.tv_usec-t1.tv_usec)/1000000.0;

printf("Time to generate:  %3.1f ms \n", time);

float time;
cudaEvent_t start, stop;

HANDLE_ERROR( cudaEventCreate(&start) );
HANDLE_ERROR( cudaEventCreate(&stop) );
HANDLE_ERROR( cudaEventRecord(start, 0) );

kernel_call<<<dimGrid, dimBlock, 0>>>();

HANDLE_ERROR( cudaEventRecord(stop, 0) );
HANDLE_ERROR( cudaEventSynchronize(stop) );
HANDLE_ERROR( cudaEventElapsedTime(&time, start, stop) );

printf("Time to generate:  %3.1f ms \n", time);

这篇关于定时CUDA运算的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆