如何限制Tensorflow GPU内存使用量? [英] How to restrict tensorflow GPU memory usage?

查看:654
本文介绍了如何限制Tensorflow GPU内存使用量?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在Nvidia GeForce RTX 2070上使用CUDA 10.0的Ubuntu 18.04中使用了tensorflow-gpu 1.13.1(驱动程序版本:415.27).

I have used tensorflow-gpu 1.13.1 in Ubuntu 18.04 with CUDA 10.0 on Nvidia GeForce RTX 2070 (Driver Version: 415.27).

下面的代码用于管理Tensorflow内存使用情况.我大约有8Gb GPU内存,因此tensorflow分配的GPU内存不得超过1Gb.但是,当我用nvidia-smi命令查看内存使用情况时,我发现尽管我使用GPUOptions限制了内存数量,但它使用了约1.5 Gb.

Code like below was used to manage tensorflow memory usage. I have about 8Gb GPU memory, so tensorflow mustn't allocate more than 1Gb of GPU memory. But when I look on memory usage with nvidia-smi command, I see, that it uses ~1.5 Gb despite the fact that I restricted memory quantity with GPUOptions.

memory_config = tf.ConfigProto(gpu_options=tf.GPUOptions(per_process_gpu_memory_fraction=0.12))

memory_config.gpu_options.allow_growth = False

with tf.Session(graph=graph, config=memory_config) as sess:
    output_dict = sess.run(tensor_dict,
                           feed_dict={image_tensor: np.expand_dims(image, 0)})

为什么会这样?以及如何避免这种情况或至少计算每个会话的内存需求?我需要对每个进程进行严格的限制,因为我有几个具有不同会话的paralell实例,因此我需要确保不会出现资源争用

Why is it going? And how I can avoid this or at least calculate memory needs for every session? I need to make strong restrictions for every process, because I have several paralell instances with different sessions, so I need to be sure, that there will no resource race

顺便说一句,我试图将memory_config.gpu_options.allow_growth设置为False,但是它什么都不影响. Tensorflow仍然独立于此标志值以相同的方式分配内存.而且似乎也很奇怪

BTW, I have tried to set memory_config.gpu_options.allow_growth to False, but it affect nothing. Tensorflow is still allocate memory the same way independently from this flag value. And it's also seems strange

推荐答案

解决方案 尝试使用gpu_options.allow_growth = True来查看tf.Session创建过程中消耗了多少默认内存.无论值如何,该内存将始终分配.

Solution Try with gpu_options.allow_growth = True to see how much default memory is consumed in tf.Session creation. That memory will be always allocated regardless of values.

根据您的结果,它应该小于500MB.因此,如果您希望每个进程真正 每个拥有1GB的内存,请计算:

Based on your result, it should be somewhere less than 500MB. So if you want each process to truly have 1GB of memory each, calculate:

(1GB minus default memory)/total_memory

原因

创建tf.Session时,无论配置如何,都会在GPU上创建Tensorflow设备.而且此设备需要最少的内存.

When you create a tf.Session, regardless of your configuration, Tensorflow device is created on GPU. And this device requires some minimum memory.

import tensorflow as tf

conf = tf.ConfigProto()
conf.gpu_options.allow_growth=True
session = tf.Session(config=conf)

给出allow_growth=True,应该没有gpu分配.但是实际上,它会产生:

Given allow_growth=True, there should be no gpu allocation. However in reality, it yields:

2019-04-05 18:44:43.460479:我tensorflow/core/common_runtime/gpu/gpu_device.cc:1053]创建了TensorFlow设备(/job:localhost/副本:0/任务:0/设备:GPU: 0(具有15127 MB内存)->物理GPU(设备:0,名称:Tesla P100-PCIE-16GB,pci总线ID:0000:03:00.0,计算能力:6.0)

2019-04-05 18:44:43.460479: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 15127 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:03:00.0, compute capability: 6.0)

占用的内存很小(根据我过去的经验,该数量因gpu型号而异). 注意:设置allow_growth与设置per_process_gpu_memory=0.00001占用的内存几乎相同,但是后者将无法正确创建会话.

which occupies small fraction of memory (in my past experience, the amount differs by gpu models). NOTE: setting allow_growth occupies almost same memory as setting per_process_gpu_memory=0.00001, but latter won't be able to create session properly.

在这种情况下,它是 345MB :

这是您遇到的偏移量.让我们看一下per_process_gpu_memory:

That is the offset you are experiencing. Let's take a look in case of per_process_gpu_memory:

conf = tf.ConfigProto()
conf.gpu_options.per_process_gpu_memory_fraction=0.1
session = tf.Session(config=conf)

由于GPU具有 16,276MB 的内存,因此设置per_process_gpu_memory_fraction = 0.1 可能可能使您认为仅会分配1,627MB.但事实是:

Since the gpu has 16,276MB of memory, setting per_process_gpu_memory_fraction = 0.1 probably makes you think only about 1,627MB will be allocated. But the truth is:

1,971MB ,但是与默认内存(345MB)和预期内存(1,627MB)的总和一致.

1,971MB is allocated, which however coincides with sum of default memory (345MB) and expected memory (1,627MB).

这篇关于如何限制Tensorflow GPU内存使用量?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆