在循环中构建图时,Tensorflow内存泄漏 [英] Tensorflow memory leak when building graph in a loop

查看:337
本文介绍了在循环中构建图时,Tensorflow内存泄漏的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我因选择使用Tensorflow(版本1.12.0)模型的超参数而进行的网格搜索由于内存消耗激增而崩溃时,我注意到了这一点.

I noticed this when my grid search for selecting hyper-parameters of a Tensorflow (version 1.12.0) model crashed due to explosion in memory consumption.

请注意,与这里的类似问题不同,我确实关闭了图和会话(使用上下文管理器),并且没有在循环中向图添加节点.

Notice that unlike similar-looking question here, I do close the graph and session (using context managers), and I am not adding nodes to the graph in the loop.

我怀疑tensorflow可能保留了在每次迭代之间都不会清除的全局变量,因此我在迭代之前和之后都调用了globals(),但是在每次迭代之前和之后都没有观察到全局变量集合中的任何区别.

I suspected that maybe tensorflow maintains global variables that do not get cleared between iterations, so I called globals() before and after an iteration but did not observe any difference in the set of global variable before and after each iteration.

我举了一个小例子,重现了这个问题.我在循环中训练了一个简单的MNIST分类器,并绘制了该进程消耗的内存:

I made a small example that reproduces the problem. I train a simple MNIST classifier in a loop and plot the memory consumed by the process:

import matplotlib.pyplot as plt
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
import psutil
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
process = psutil.Process(os.getpid())

N_REPS = 100
N_ITER = 10
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
x_test, y_test = mnist.test.images, mnist.test.labels

# Runs experiment several times.
mem = []
for i in range(N_REPS):
    with tf.Graph().as_default():
        net = tf.contrib.layers.fully_connected(x_test, 200)
        logits = tf.contrib.layers.fully_connected(net, 10, activation_fn=None)
        loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y_test, logits=logits))
        train_op = tf.train.AdamOptimizer(learning_rate=0.0001).minimize(loss)
        init = tf.global_variables_initializer()
        with tf.Session() as sess:
            # training loop.
            sess.run(init)
            for _ in range(N_ITER):
                sess.run(train_op)
    mem.append(process.memory_info().rss)
plt.plot(range(N_REPS), mem)

结果图如下:

在我的实际项目中,进程内存从几百MB开始(取决于数据集的大小),最高可达64 GB,直到我的系统内存不足为止.我尝试过一些使增速变慢的方法,例如使用占位符和feed_dicts而不是依赖convert_to_tensor.但是持续的增长仍然存在,只是速度较慢.

In my actual project, process memory starts from a couple of hundreds MB (depending on dataset size), and goes up to 64 GB until my system run out of memory. There are things that I tried that slow down the increase, such as using placeholders and feed_dicts instead of relying on convert_to_tensor. But the constant increase is still there, only slower.

推荐答案

尝试在会话内部进行循环.不要为每次迭代创建图和会话.每次创建图并初始化变量时,您不是在重新定义旧图,而是创建新图导致内存泄漏.我遇到了类似的问题,并且能够通过在会话内部进行循环来解决此问题.

Try and take the loop inside the session. Don't create the graph and session for every iteration. Every time the graph is created and variable initialized, you are not redefining the old graph but creating new ones leading to memory leaks. I was facing a similar issue and was able to solve this by taking the loop inside the session.

来自如何不编程Tensorflow

  • 当您创建操作时,请注意,仅创建所需的操作.尝试使操作创建与操作执行保持不同.
  • 尤其是如果您仅使用默认图形并在常规REPL或笔记本中交互运行,则图形中可能会有很多废弃的操作.每次重新运行定义任何图形操作的笔记本单元时,您不仅在重新定义操作,还创建了新的操作.
  • Be conscious of when you’re creating ops, and only create the ones you need. Try to keep op creation distinct from op execution.
  • Especially if you’re just working with the default graph and running interactively in a regular REPL or a notebook, you can end up with a lot of abandoned ops in your graph. Every time you re-run a notebook cell that defines any graph ops, you aren’t just redefining ops—you’re creating new ones.

这篇关于在循环中构建图时,Tensorflow内存泄漏的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆