CUDA:CUtil计时器 - 经过时间的混淆 [英] CUDA: CUtil timer - confusion on elapsed time

查看:282
本文介绍了CUDA:CUtil计时器 - 经过时间的混淆的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我评估我的程序,我看到,在某一点,我得到高达100毫秒的时间流逝。我已经搜索了每个操作,但单独没有操作这次。然后我注意到,无论我做什么cudaThreadSynchronize调用,第一个调用需要100毫秒。然后我写了这样的例子如下。当在第一行调用cudaThreadSynchronize时,发现结尾的已用时间值小于1毫秒。但如果它不被调用,则平均需要110毫秒。

When I assess my program, I saw that at some point I get up to 100msec time lapse. I have searched every operation, but individually no operation was taking this time. Then I have noticed that wherever I do place cudaThreadSynchronize call, the first call takes 100 msec. Then I have written such an example below. When cudaThreadSynchronize is called at the first line, the elapsed time value at the end is found less than 1 msec. But if it is not called then it takes 110msec on average.

int main(int argc, char **argv)
{
    cudaThreadSynchronize(); //Comment out it then get 110msec as elapsed time..

    unsigned int timer;
    cutCreateTimer(&timer);
    cutStartTimer(timer);

    float *data;
    CUDA_SAFE_CALL(cudaMalloc(&data, sizeof(float) * 1024));

    cutStopTimer(timer);
    printf("CUT Elapsed: %.3f\n", cutGetTimerValue(timer));

    cutDeleteTimer(timer);

    return EXIT_SUCCESS;
}



我想cudaThreadSynchronize()在开始处理CUDA库的初始化。它是正确的方式来完全初始化内核,所以它不会影响其他操作的时间评估?这是够了,正确的,在开始调用cudaThreadSynchronize,或有任何正确的方法。

I think cudaThreadSynchronize() at the start handles the initialization of the CUDA library. Is it the correct way to fully initialize the kernel, so it will not affect other operations' time assessment? Is it enough, and correct to call cudaThreadSynchronize at the start, or is there any correct way..

推荐答案

CUDA,必须首先在GPU上创建CUDA上下文,这需要大约70-100ms。在你的例子中, cudaThreadSynchronize(); 正在创建上下文。仅为应用程序创建一次上下文。当做时序分析时,我也做一个虚拟内存复制来创建一个上下文(如上面使用 cudaThreadSynchronize(); )所做的。

In order to use CUDA, a 'CUDA context' must be first created on the GPU, this takes around 70-100ms. In your example cudaThreadSynchronize(); is making the context. A context is created only once for your application. When doing timing analysis I also do a dummy memory copy to create a context (as you have done above using cudaThreadSynchronize();).

这篇关于CUDA:CUtil计时器 - 经过时间的混淆的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆