GPU RAM已占用,但没有PID [英] GPU RAM occupied but no PIDs
问题描述
nvidia-smi
显示如下,表明在GPU0上使用了3.77GB,但未列出GPU0的进程:
The nvidia-smi
shows following indicating 3.77GB utilized on GPU0 but no processes are listed for GPU0:
(base) ~/.../fast-autoaugment$ nvidia-smi
Fri Dec 20 13:48:12 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.50 Driver Version: 430.50 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 TITAN Xp Off | 00000000:03:00.0 Off | N/A |
| 23% 34C P8 9W / 250W | 3771MiB / 12196MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 TITAN Xp Off | 00000000:84:00.0 On | N/A |
| 38% 62C P8 24W / 250W | 2295MiB / 12188MiB | 8% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 1 1910 G /usr/lib/xorg/Xorg 105MiB |
| 1 2027 G /usr/bin/gnome-shell 51MiB |
| 1 3086 G /usr/lib/xorg/Xorg 1270MiB |
| 1 3237 G /usr/bin/gnome-shell 412MiB |
| 1 30593 G /proc/self/exe 286MiB |
| 1 31849 G ...quest-channel-token=4371017438329004833 164MiB |
+-----------------------------------------------------------------------------+
类似地, nvtop
显示相同的GPU RAM利用率,但是其列出的进程显示 TYPE = Compute
,如果尝试杀死PID,则显示错误:
Similarly nvtop
shows same GPU RAM utilization but the processes it lists shows TYPE=Compute
and if you try to kill PIDs it shows then you get error:
(base) ~/.../fast-autoaugment$ kill 27761
bash: kill: (27761) - No such process
如何回收看似鬼进程占用的GPU RAM?
How to reclaim GPU RAM occupied by apparently ghost processes?
推荐答案
使用以下命令深入了解占用GPU RAM的幻影进程:
Use following command to get insight into ghost processes occupying GPU RAM:
sudo fuser -v /dev/nvidia*
在我的情况下,输出为:
In my case, output is:
(base) ~/.../fast-autoaugment$ sudo fuser -v /dev/nvidia*
USER PID ACCESS COMMAND
/dev/nvidia0: shitals 517 F.... nvtop
root 1910 F...m Xorg
gdm 2027 F.... gnome-shell
root 3086 F...m Xorg
shitals 3237 F.... gnome-shell
shitals 27808 F...m python
shitals 27809 F...m python
shitals 27813 F...m python
shitals 27814 F...m python
shitals 28091 F...m python
shitals 28092 F...m python
shitals 28096 F...m python
这显示了nvidia-smi和nvtop无法显示的进程.我杀死了所有 python
进程后,就释放了GPU RAM.
This shows processes that nvidia-smi as well as nvtop fails to shows. After I killed all of the python
processes, the GPU RAM was freed up.
另一种尝试的方法是使用以下命令重置GPU:
Another thing to try is to reset GPU using the command:
sudo nvidia-smi --gpu-reset -i 0
这篇关于GPU RAM已占用,但没有PID的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!