如何限制Tensorflow GPU内存使用量? [英] How to restrict tensorflow GPU memory usage?
问题描述
我在Nvidia GeForce RTX 2070上使用CUDA 10.0的Ubuntu 18.04中使用了tensorflow-gpu 1.13.1(驱动程序版本:415.27).
I have used tensorflow-gpu 1.13.1 in Ubuntu 18.04 with CUDA 10.0 on Nvidia GeForce RTX 2070 (Driver Version: 415.27).
下面的代码用于管理Tensorflow内存使用情况.我大约有8Gb GPU内存,因此tensorflow分配的GPU内存不得超过1Gb.但是,当我用nvidia-smi
命令查看内存使用情况时,我发现尽管我使用GPUOptions限制了内存数量,但它使用了约1.5 Gb.
Code like below was used to manage tensorflow memory usage. I have about 8Gb GPU memory, so tensorflow mustn't allocate more than 1Gb of GPU memory. But when I look on memory usage with nvidia-smi
command, I see, that it uses ~1.5 Gb despite the fact that I restricted memory quantity with GPUOptions.
memory_config = tf.ConfigProto(gpu_options=tf.GPUOptions(per_process_gpu_memory_fraction=0.12))
memory_config.gpu_options.allow_growth = False
with tf.Session(graph=graph, config=memory_config) as sess:
output_dict = sess.run(tensor_dict,
feed_dict={image_tensor: np.expand_dims(image, 0)})
为什么会这样?以及如何避免这种情况或至少计算每个会话的内存需求?我需要对每个进程进行严格的限制,因为我有几个具有不同会话的paralell实例,因此我需要确保不会出现资源争用
Why is it going? And how I can avoid this or at least calculate memory needs for every session? I need to make strong restrictions for every process, because I have several paralell instances with different sessions, so I need to be sure, that there will no resource race
顺便说一句,我试图将memory_config.gpu_options.allow_growth设置为False,但是它什么都不影响. Tensorflow仍然独立于此标志值以相同的方式分配内存.而且似乎也很奇怪
BTW, I have tried to set memory_config.gpu_options.allow_growth to False, but it affect nothing. Tensorflow is still allocate memory the same way independently from this flag value. And it's also seems strange
推荐答案
解决方案
尝试使用gpu_options.allow_growth = True
来查看tf.Session
创建过程中消耗了多少默认内存.无论值如何,该内存将始终分配.
Solution
Try with gpu_options.allow_growth = True
to see how much default memory is consumed in tf.Session
creation. That memory will be always allocated regardless of values.
根据您的结果,它应该小于500MB.因此,如果您希望每个进程真正 每个拥有1GB的内存,请计算:
Based on your result, it should be somewhere less than 500MB. So if you want each process to truly have 1GB of memory each, calculate:
(1GB minus default memory)/total_memory
原因
创建tf.Session
时,无论配置如何,都会在GPU上创建Tensorflow设备.而且此设备需要最少的内存.
When you create a tf.Session
, regardless of your configuration, Tensorflow device is created on GPU. And this device requires some minimum memory.
import tensorflow as tf
conf = tf.ConfigProto()
conf.gpu_options.allow_growth=True
session = tf.Session(config=conf)
给出allow_growth=True
,应该没有gpu分配.但是实际上,它会产生:
Given allow_growth=True
, there should be no gpu allocation. However in reality, it yields:
2019-04-05 18:44:43.460479:我tensorflow/core/common_runtime/gpu/gpu_device.cc:1053]创建了TensorFlow设备(/job:localhost/副本:0/任务:0/设备:GPU: 0(具有15127 MB内存)->物理GPU(设备:0,名称:Tesla P100-PCIE-16GB,pci总线ID:0000:03:00.0,计算能力:6.0)
2019-04-05 18:44:43.460479: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 15127 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:03:00.0, compute capability: 6.0)
占用的内存很小(根据我过去的经验,该数量因gpu型号而异). 注意:设置allow_growth
与设置per_process_gpu_memory=0.00001
占用的内存几乎相同,但是后者将无法正确创建会话.
which occupies small fraction of memory (in my past experience, the amount differs by gpu models). NOTE: setting allow_growth
occupies almost same memory as setting per_process_gpu_memory=0.00001
, but latter won't be able to create session properly.
在这种情况下,它是 345MB :
这是您遇到的偏移量.让我们看一下per_process_gpu_memory
:
That is the offset you are experiencing. Let's take a look in case of per_process_gpu_memory
:
conf = tf.ConfigProto()
conf.gpu_options.per_process_gpu_memory_fraction=0.1
session = tf.Session(config=conf)
由于GPU具有 16,276MB 的内存,因此设置per_process_gpu_memory_fraction = 0.1
可能可能使您认为仅会分配1,627MB.但事实是:
Since the gpu has 16,276MB of memory, setting per_process_gpu_memory_fraction = 0.1
probably makes you think only about 1,627MB will be allocated. But the truth is:
1,971MB ,但是与默认内存(345MB)和预期内存(1,627MB)的总和一致.
1,971MB is allocated, which however coincides with sum of default memory (345MB) and expected memory (1,627MB).
这篇关于如何限制Tensorflow GPU内存使用量?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!