CUDA错误:内存不足-Python解释器使用了所有GPU内存 [英] CUDA Error: out of memory - Python interpreter utilizes all GPU memory

查看:232
本文介绍了CUDA错误:内存不足-Python解释器使用了所有GPU内存的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

即使重新启动计算机后,python3仍使用超过95%的GPU内存. 请注意,即使没有正在运行的训练脚本,内存消耗也会保持不变,而且我从未在系统环境中使用keras/tensorflow,仅在venv或docker容器中使用过.

Even after rebooting the machine, there is >95% of GPU Memory used by python3. Note that memory consumption keeps even if there are no running training scripts, and I've never used keras/tensorflow in the system environment, only with venv or in docker container.

已更新: 最后一个活动是使用以下配置执行NN测试脚本:

UPDATED: The last activity was the execution of NN test script with the following configurations:

tensorflow==1.14.0 Keras==2.0.3

tf.autograph.set_verbosity(1)
session_conf = tf.ConfigProto(intra_op_parallelism_threads=8, inter_op_parallelism_threads=8)
tf.set_random_seed(1)
session_conf.gpu_options.allow_growth = True
sess = tf.Session(graph=tf.get_default_graph(), config=session_conf)
K.set_session(sess)

$ nvidia-smi

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.26       Driver Version: 440.26       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 105...  Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   53C    P3    N/A /  N/A |   3981MiB /  4042MiB |      1%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      4105      G   /usr/lib/xorg/Xorg                           145MiB |
|    0      4762      C   /usr/bin/python3                            3631MiB |
|    0      4764      G   /usr/bin/gnome-shell                          88MiB |
|    0      5344      G   ...quest-channel-token=8947774662807822104    61MiB |
|    0      6470      G   ...Charm-P/ch-0/191.6605.12/jre64/bin/java     5MiB |
|    0      7200      C   python                                        45MiB |
+-----------------------------------------------------------------------------+


以恢复模式重新启动后,我尝试运行nvidia-smi -r,但不能解决问题.

After rebooting in recovery mode, I've tried to run nvidia-smi -r but It didn't solve the issue.

推荐答案

默认情况下,Tf为进程的生命周期分配GPU内存,而不是为会话对象的生命周期分配GPU内存(因此内存可以比对象更长的时间).这就是为什么在停止程序后内存会持续存在的原因.在很多情况下,使用gpu_options.allow_growth = True参数很灵活,但是它将分配运行时进程所需的GPU内存.

By default Tf allocates GPU memory for the lifetime of a process, not the lifetime of the session object (so memory can linger much longer than the object). That is why memory is lingering after you stop the program. In a lot of cases, using the gpu_options.allow_growth = True parameter is flexible, but it will allocate as much GPU memory needed as the runtime process requires.

为防止tf.Session使用所有GPU内存,可以通过更改gpu_options.allow_growth = True为定义的内存比例分配固定数量的内存,以用于整个过程(由于程序看起来像是使用50%)以便能够在运行时使用大量内存),例如:

To prevent tf.Session from using all of your GPU memory, you can allocate a fixed amount of memory for the total process by changing your gpu_options.allow_growth = True to allow for a defined memory fraction (let's use 50% since your program seems to be able to use a lot of memory) at runtime like:

session_conf.gpu_options.per_process_gpu_memory_fraction = 0.5

这应该使您无法达到上限并限制在〜2GB(因为看起来您有4GB的GPU).

This should stop you from reaching the upper limit and cap at ~2GB (since it looks like you have 4GB of GPU).

这篇关于CUDA错误:内存不足-Python解释器使用了所有GPU内存的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆