在测量CUDA内核运行时间时是否需要预热代码? [英] Is the warmup code necessary when measuring CUDA kernel running time?

查看:261
本文介绍了在测量CUDA内核运行时间时是否需要预热代码?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在第85页中,专业CUDA C编程

int main()
{
    ......
    // run a warmup kernel to remove overhead
    size_t iStart,iElaps;
    cudaDeviceSynchronize();
    iStart = seconds();
    warmingup<<<grid, block>>> (d_C);
    cudaDeviceSynchronize();
    iElaps = seconds() - iStart;
    printf("warmup <<< %4d %4d >>> elapsed %d sec \n",grid.x,block.x, iElaps );

    // run kernel 1
    iStart = seconds();
    mathKernel1<<<grid, block>>>(d_C);
    cudaDeviceSynchronize();
    iElaps = seconds() - iStart;
    printf("mathKernel1 <<< %4d %4d >>> elapsed %d sec \n",grid.x,block.x,iElaps );

    // run kernel 3
    iStart = seconds();
    mathKernel2<<<grid, block>>>(d_C);
    cudaDeviceSynchronize();
    iElaps = seconds () - iStart;
    printf("mathKernel2 <<< %4d %4d >>> elapsed %d sec \n",grid.x,block.x,iElaps );

    // run kernel 3
    iStart = seconds ();
    mathKernel3<<<grid, block>>>(d_C);
    cudaDeviceSynchronize();
    iElaps = seconds () - iStart;
    printf("mathKernel3 <<< %4d %4d >>> elapsed %d sec \n",grid.x,block.x,iElaps);
    ......
}

我们可以看到有一个在测量不同内核的运行时间之前进行预热。

We can see there is a warmup before measuring the running time of different kernels.

来自

From GPU cards warming up?, I know the reason is:


如果它们是非显示卡,一段时间不活动后,驾驶员很可能会自行关机。因此,您在第一次运行时所看到的可能就是初始化开销,该开销仅发生一次。

If they are non-display cards, it might well be the driver shutting itself down after a period of inactivity. So what you are seeing on the first run might well be initialization overhead that only happens once.

因此,如果我的GPU卡长时间不处于活动状态,例如,我只是使用它来运行某些程序,则应该不需要运行任何预热代码。我的理解对吗?

So if my GPU card isn't inactive for a long time, e.g, I just use it to run some programs, it should not need to run any warmup code. Is my understanding right?

推荐答案

除了GPU处于省电状态之外,还有许多其他原因导致第一个内核的启动可能比进一步运行要慢:

Besides the GPU being in a power saving state there can be a number of other reasons why the first launch of a kernel could be slower than further runs:


  • 及时编译

  • 将内核转移到GPU内存

  • 缓存内容

  • ...

  • just-in-time compilation
  • transfer of kernel to GPU memory
  • cache content
  • ...

由于这些原因,如果您对连续内核启动所能达到的持续速度感兴趣,那么在定时内核运行之前至少执行一次预热运行始终是一个好习惯。

For these reasons it is always good practice to perform at least one "warmup run" before the timed kernel run, if you are interested in the sustained speed that consecutive kernel launches achieve.

但是,如果您有一个特定的应用程序和用例,那么在相关情况下对该应用程序进行基准测试总是有意义的。不过,要为在不受控制的测量中运行时的较大变化做好准备。

If however you have a specific application and use case in mind, it always makes sense to benchmark that application under the relevant circumstances. Be prepared though for much larger variations in runtime in that less controlled measurement.

这篇关于在测量CUDA内核运行时间时是否需要预热代码?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆