CUDA:CUtil 计时器 - 经过时间的混淆 [英] CUDA: CUtil timer - confusion on elapsed time

查看:13
本文介绍了CUDA:CUtil 计时器 - 经过时间的混淆的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我评估我的程序时,我发现在某些时候我会出现长达 100 毫秒的时间间隔.我已经搜索了每个操作,但单独没有操作花费这个时间.然后我注意到无论我在哪里进行 cudaThreadSynchronize 调用,第一次调用都需要 100 毫秒.然后我在下面写了这样一个例子.在第一行调用 cudaThreadSynchronize 时,发现最后经过的时间值小于 1 毫秒.但如果不调用它,则平均需要 110 毫秒.

When I assess my program, I saw that at some point I get up to 100msec time lapse. I have searched every operation, but individually no operation was taking this time. Then I have noticed that wherever I do place cudaThreadSynchronize call, the first call takes 100 msec. Then I have written such an example below. When cudaThreadSynchronize is called at the first line, the elapsed time value at the end is found less than 1 msec. But if it is not called then it takes 110msec on average.

int main(int argc, char **argv)
{
    cudaThreadSynchronize(); //Comment out it then get 110msec as elapsed time..

    unsigned int timer;
    cutCreateTimer(&timer);
    cutStartTimer(timer);

    float *data;
    CUDA_SAFE_CALL(cudaMalloc(&data, sizeof(float) * 1024));

    cutStopTimer(timer);
    printf("CUT Elapsed: %.3f
", cutGetTimerValue(timer));

    cutDeleteTimer(timer);

    return EXIT_SUCCESS;
}

我认为一开始的 cudaThreadSynchronize() 会处理 CUDA 库的初始化.完全初始化内核的方法是否正确,不会影响其他操作的时间评估?一开始就调用 cudaThreadSynchronize 是否足够,是否正确,或者有什么正确的方法..

I think cudaThreadSynchronize() at the start handles the initialization of the CUDA library. Is it the correct way to fully initialize the kernel, so it will not affect other operations' time assessment? Is it enough, and correct to call cudaThreadSynchronize at the start, or is there any correct way..

推荐答案

为了使用 CUDA,必须首先在 GPU 上创建一个CUDA 上下文",这大约需要 70-100 毫秒.在您的示例中, cudaThreadSynchronize(); 正在制作上下文.上下文只为您的应用程序创建一次.在进行时序分析时,我还做了一个虚拟内存副本来创建上下文(就像您在上面使用 cudaThreadSynchronize(); 所做的那样).

In order to use CUDA, a 'CUDA context' must be first created on the GPU, this takes around 70-100ms. In your example cudaThreadSynchronize(); is making the context. A context is created only once for your application. When doing timing analysis I also do a dummy memory copy to create a context (as you have done above using cudaThreadSynchronize();).

这篇关于CUDA:CUtil 计时器 - 经过时间的混淆的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆