CUDA应用程序超时&在几秒钟后失败 - 如何解决这个问题? [英] CUDA apps time out & fail after several seconds - how to work around this?

查看:562
本文介绍了CUDA应用程序超时&在几秒钟后失败 - 如何解决这个问题?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我注意到,CUDA应用程序往往具有5-15秒的粗略最大运行时间,然后它们将失败并退出。我意识到理想的是没有CUDA应用程序运行那么长,但假设它是使用CUDA的正确选择,并且由于每个线程的顺序工作量,它必须运行那么长,是有任何方法来延长这个时间量,要解决这个问题?

解决方案

我不是CUDA专家,

您可以禁用Windows监视程序计时器,但这是 强烈不推荐 >,原因应该很明显。
要禁用它,您需要regedit HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Watchdog\Display\DisableBugCheck ,创建一个REG_DWORD并设置它到1.
您可能还需要在NVidia控制面板中做一些事情。在CUDA文档中查找VPU恢复的一些参考。



理想情况下,您应该能够将您的内核操作分解为多次传递数据,



或者,您可以将问题域分开,以便每个命令计算更少的输出像素。也就是说,不是一次计算1,000,000个输出像素,而是向gpu发出10个命令以计算100,000个输出像素。



必须适合时间片的基本单位不是你的整个应用程序,而是单个命令缓冲区的执行。在AMD Stream SDK中,通过使用CtxFlush()调用显式刷新命令队列,可以将长操作序列分解为多个时间片。也许CUDA有类似的东西?



你应该在每个时间片上,通过PCIX总线来回读取你的所有数据;你可以留下你的纹理等等在gpu本地内存;你只是有一些命令缓冲区偶尔完成,向操作系统证明你没有陷入无限循环。



最后,GPU是 fast ,所以如果你的应用程序不能在5或10秒钟内做有用的工作,我会认为这是一个错误的迹象。



上面的注册表项已过期。我认为这是Windows XP 64位的关键。有Vista和Windows 7的新注册表项。您可以在这里找到它们: http: //www.microsoft.com/whdc/device/display/wddm_timeout.mspx
或此处: http://msdn.microsoft.com/en-us/library/ee817001.aspx



这真的过时了。假设您安装了NVIDIA Nsight工具,最简单的禁用Cuda编程的TDR的方法是打开Nsight Monitor,单击Nsight Monitor选项,然后在General下将WDDM TDR enabled设置为false。这将更改您的注册表设置。关闭并重新启动。对TDR注册表设置的任何更改将在重新启动后生效。


I've noticed that CUDA applications tend to have a rough maximum run-time of 5-15 seconds before they will fail and exit out. I realize it's ideal to not have CUDA application run that long but assuming that it is the correct choice to use CUDA and due to the amount of sequential work per thread it must run that long, is there any way to extend this amount of time or to get around it?

解决方案

I'm not a CUDA expert, --- I've been developing with the AMD Stream SDK, which AFAIK is roughly comparable.

You can disable the Windows watchdog timer, but that is highly not recommended, for reasons that should be obvious. To disable it, you need to regedit HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Watchdog\Display\DisableBugCheck, create a REG_DWORD and set it to 1. You may also need to do something in the NVidia control panel. Look for some reference to "VPU Recovery" in the CUDA docs.

Ideally, you should be able to break your kernel operations up into multiple passes over your data to break it up into operations that run in the time limit.

Alternatively, you can divide the problem domain up so that it's computing fewer output pixels per command. I.e., instead of computing 1,000,000 output pixels in one fell swoop, issue 10 commands to the gpu to compute 100,000 each.

The basic unit that has to fit within the time slice is not your entire application, but the execution of a single command buffer. In the AMD Stream SDK, a long sequence of operations can be broken up into multiple time slices by explicitly flushing the command queue with a CtxFlush() call. Perhaps CUDA has something similar?

You should not have to read all of your data back and forth across the PCIX bus on every time slice; you can leave your textures, etc. in gpu local memory; you just have some command buffers complete occasionally, to prove to the OS that you're not stuck in an infinite loop.

Finally, GPUs are fast, so if your application is not able to do useful work in that 5 or 10 seconds, I'd take that as a sign that something is wrong.

[EDIT Mar 2010 to update:] The registry key above is out-of-date. I think that was the key for Windows XP 64-bit. There are new registry keys for Vista and Windows 7. You can find them here: http://www.microsoft.com/whdc/device/display/wddm_timeout.mspx or here: http://msdn.microsoft.com/en-us/library/ee817001.aspx

[EDIT Apr 2015 to update:] This is getting really out of date. The easiest way to disable TDR for Cuda programming, assuming you have the NVIDIA Nsight tools installed, is to open the Nsight Monitor, click on "Nsight Monitor options", and under "General" set "WDDM TDR enabled" to false. This will change the registry setting for you. Close and reboot. Any change to the TDR registry setting won't take effect until you reboot.

这篇关于CUDA应用程序超时&在几秒钟后失败 - 如何解决这个问题?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆