如何在Windows上使用辅助GPU覆盖CUDA内核执行时间限制? [英] How can I override the CUDA kernel execution time limit on Windows with a secondary GPUs?

查看:642
本文介绍了如何在Windows上使用辅助GPU覆盖CUDA内核执行时间限制?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

从Nvidia的网站,它解释了超时问题:


问:最大内核执行时间是多少?在Windows上,单个
GPU程序启动的最大运行时间约为5秒。
超过这个时间限制通常会导致通过CUDA驱动程序或CUDA运行时报告
的启动失败,但在某些情况下可以
挂起整个机器,需要硬重置。这是由
的Windows看门狗计时器导致程序使用主
图形适配器超时,如果他们运行时间超过最大
允许的时间造成的。

$ b $由于这个原因,建议CUDA在GPU上运行,
不附加到显示器,没有Windows桌面
扩展到它。在这种情况下,系统必须至少包含一个用作主图形适配器的
NVIDIA GPU。


资料来源: https://developer.nvidia.com/cuda-faq

所以看来,nvidia相信,或至少强烈暗示,有多(nvidia)gpus和正确的配置,可以防止这种情况发生?



但是怎么样?到目前为止,我尝试了很多方式,但仍然有一个令人讨厌的超时在GK110 GPU是:(1)插入次要PCIE 16X插槽; (2)未连接到任何显示器(3)设置为在驱动程序控制面板中用作独有的physX卡(由其他人推荐),但是阻塞仍然存在。

解决方案

如果你的GK110是一个特斯拉 K20c GPU,那么你应该将设备从wddm模式切换到TCC模式。这可以通过与驱动程序一起安装的nvidia-smi.exe工具来完成。使用Windows搜索功能找到此文件(nvidia-smi.exe),然后使用命令行帮助(`nvidia-smi --help)发现将GPU从WDDM切换到TCC模式所需的命令。



一旦你这样做,Windows看门狗机制将不再注意你的GK110设备。



如果在其他手是一个GeForce GPU,没有办法切换到TCC模式。您唯一的选择是修改注册表设置,这有点困难。



如果GPU处于WDDM模式,则会受到看门狗定时器的限制。


From Nvidia's website, it explain the time-out problem:

Q: What is the maximum kernel execution time? On Windows, individual GPU program launches have a maximum run time of around 5 seconds. Exceeding this time limit usually will cause a launch failure reported through the CUDA driver or the CUDA runtime, but in some cases can hang the entire machine, requiring a hard reset. This is caused by the Windows "watchdog" timer that causes programs using the primary graphics adapter to time out if they run longer than the maximum allowed time.

For this reason it is recommended that CUDA is run on a GPU that is NOT attached to a display and does not have the Windows desktop extended onto it. In this case, the system must contain at least one NVIDIA GPU that serves as the primary graphics adapter.

Source: https://developer.nvidia.com/cuda-faq

So it seems that, nvidia believes, or at least strongly implys, having multi- (nvidia) gpus, and with proper configuration, can prevent this from happening?

But how? so far I tried lots ways but there is still the annoying time-out on a GK110 GPU that is: (1) plugging in the secondary PCIE 16X slots; (2) Not being connected to any monitors (3) Is setted to use as an exclusive physX card in driver control panel (as recommended by some other guys), but the block-out is still there.

解决方案

If your GK110 is a Tesla K20c GPU, then you should switch the device from wddm mode to TCC mode. This can be done with the nvidia-smi.exe tool that gets installed with the driver. Use the windows search function to find this file (nvidia-smi.exe) then use the command line help (`nvidia-smi --help) to discover the commands necessary to switch a GPU from WDDM to TCC mode.

Once you have done this, the windows watchdog mechanism will no longer pay attention to your GK110 device.

If on the other hand it is a GeForce GPU, there is no way to switch it to TCC mode. Your only option is to modify the registry settings, which is somewhat difficult. Your mileage may vary, as the exact structure of the reg keys varies by OS.

If a GPU is in WDDM mode, it is subject to the watchdog timer.

这篇关于如何在Windows上使用辅助GPU覆盖CUDA内核执行时间限制?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆