如何使用 CUDA 刷新 GPU 内存(物理重置不可用) [英] How can I flush GPU memory using CUDA (physical reset is unavailable)

查看:68
本文介绍了如何使用 CUDA 刷新 GPU 内存(物理重置不可用)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在内存被刷新之前,我的 CUDA 程序在执行期间崩溃了.结果,设备内存仍然被占用.

My CUDA program crashed during execution, before memory was flushed. As a result, device memory remained occupied.

我在 GTX 580 上运行,不支持 nvidia-smi --gpu-reset.

I'm running on a GTX 580, for which nvidia-smi --gpu-reset is not supported.

cudaDeviceReset() 放在程序开头只会影响进程创建的当前上下文,不会刷新之前分配的内存.

Placing cudaDeviceReset() in the beginning of the program is only affecting the current context created by the process and doesn't flush the memory allocated before it.

我正在使用该 GPU 远程访问 Fedora 服务器,因此物理重置非常复杂.

I'm accessing a Fedora server with that GPU remotely, so physical reset is quite complicated.

所以,问题是 - 在这种情况下有什么方法可以刷新设备内存?

So, the question is - Is there any way to flush the device memory in this situation?

推荐答案

虽然在特殊情况下没有必要这样做,但在 linux 主机上执行此操作的推荐方法是通过执行以下操作卸载 nvidia 驱动程序

Although it should be unecessary to do this in anything other than exceptional circumstances, the recommended way to do this on linux hosts is to unload the nvidia driver by doing

$ rmmod nvidia 

具有合适的 root 权限,然后重新加载它

with suitable root privileges and then reloading it with

$ modprobe nvidia

如果机器正在运行 X11,您需要事先手动停止它,然后重新启动它.驱动程序初始化过程应消除设备上的任何先前状态.

If the machine is running X11, you will need to stop this manually beforehand, and restart it afterwards. The driver intialisation processes should eliminate any prior state on the device.

此答案已从评论中收集并发布为社区 wiki,以将此问题从 CUDA 标记的未回答列表中删除

这篇关于如何使用 CUDA 刷新 GPU 内存(物理重置不可用)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆