任何特定的函数来初始化GPU而不是第一个cudaMalloc调用? [英] Any particular function to initialize GPU other than the first cudaMalloc call?

查看:593
本文介绍了任何特定的函数来初始化GPU而不是第一个cudaMalloc调用?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

第一个cudaMalloc调用是慢的(如0.2秒),因为在GPU上的一些初始化工作。有没有任何功能,只做初始化,这样我可以分开的时间? cudaSetDevice似乎将时间减少到0.15秒,但仍然不能消除所有init开销。

The first cudaMalloc call is slow (like 0.2 sec) because of some initialization work on GPU. Is there any function that solely do initialization, so that I can separate the time? cudaSetDevice seems to reduce the time to 0.15 secs, but still does not eliminate all init overheads.

推荐答案

呼叫

cudaFree(0);

是在CUDA运行时强制延迟上下文建立的规范方法。您不能减少开销,这是驱动程序,运行时和操作系统延迟的函数。

is the canonical way to force lazy context establishment in the CUDA runtime. You can't reduce the overhead, that is a function of driver, runtime and operating system latencies. But the call above will let you control how/when those overheads occur during program execution.

在2015年编辑添加了上下文的启发式运行时API中的初始化随着时间的推移发生了微妙的变化,因此 cudaSetDevice 现在建立了上下文,因此 cudaFree()没有明确要求初始化上下文,可以使用 cudaSetDevice 。还要注意,在第一次内核启动时仍然会产生一些设置时间,而在这之前不是这样。对于内核时序,最好在启动内核之前包括热身调用,以便及时删除此设置延迟。看来,各种性能分析工具有足够的粒度,以避免这种情况,而无需任何额外的API调用或内核调用。

EDIT in 2015 to add that the heuristics of context initialisation in the runtime API have subtly changed over time so that cudaSetDevice now establishes a context, so the cudaFree() call isn't explicitly required to intialise a context, you can use cudaSetDeviceinstead. Also note that some set-up time will still be incurred at the first kernel launch, whereas before this wasn't the case. For for kernel timing, it is best to include a warm-up call first before launching the kernel you will time to remove this set-up latency. It appears that the various profiling tools have enough granularity built in to avoid this without any extra API calls or kernel calls.

这篇关于任何特定的函数来初始化GPU而不是第一个cudaMalloc调用?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆