模型执行后清除 Tensorflow GPU 内存 [英] Clearing Tensorflow GPU memory after model execution

查看:179
本文介绍了模型执行后清除 Tensorflow GPU 内存的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经训练了 3 个模型,现在正在运行代码,依次加载 3 个检查点中的每一个,并使用它们运行预测.我正在使用 GPU.

I've trained 3 models and am now running code that loads each of the 3 checkpoints in sequence and runs predictions using them. I'm using the GPU.

当第一个模型加载时,它会预先分配整个 GPU 内存(我想要处理第一批数据).但是当它完成时它不会卸载内存.当加载第二个模型时,同时使用 tf.reset_default_graph()with tf.Graph().as_default() GPU 内存仍然被第一个模型完全消耗,然后第二个模型内存不足.

When the first model is loaded it pre-allocates the entire GPU memory (which I want for working through the first batch of data). But it doesn't unload memory when it's finished. When the second model is loaded, using both tf.reset_default_graph() and with tf.Graph().as_default() the GPU memory still is fully consumed from the first model, and the second model is then starved of memory.

除了使用 Python 子进程或多处理来解决这个问题(我通过谷歌搜索找到的唯一解决方案)之外,有没有办法解决这个问题?

Is there a way to resolve this, other than using Python subprocesses or multiprocessing to work around the problem (the only solution I've found on via google searches)?

推荐答案

A git issue from June 2016 (https://github.com/tensorflow/tensorflow/issues/1727) 表示存在如下问题:

A git issue from June 2016 (https://github.com/tensorflow/tensorflow/issues/1727) indicates that there is the following problem:

目前GPUDevice中的Allocator属于ProcessState,这本质上是一个全局单例.使用 GPU 的第一次会话初始化它,并在进程关闭时释放自身.

currently the Allocator in the GPUDevice belongs to the ProcessState, which is essentially a global singleton. The first session using GPU initializes it, and frees itself when the process shuts down.

因此唯一的解决方法是使用进程并在计算后关闭它们.

Thus the only workaround would be to use processes and shut them down after the computation.

示例代码:

import tensorflow as tf
import multiprocessing
import numpy as np

def run_tensorflow():

    n_input = 10000
    n_classes = 1000

    # Create model
    def multilayer_perceptron(x, weight):
        # Hidden layer with RELU activation
        layer_1 = tf.matmul(x, weight)
        return layer_1

    # Store layers weight & bias
    weights = tf.Variable(tf.random_normal([n_input, n_classes]))


    x = tf.placeholder("float", [None, n_input])
    y = tf.placeholder("float", [None, n_classes])
    pred = multilayer_perceptron(x, weights)

    cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=pred, labels=y))
    optimizer = tf.train.AdamOptimizer(learning_rate=0.001).minimize(cost)

    init = tf.global_variables_initializer()

    with tf.Session() as sess:
        sess.run(init)

        for i in range(100):
            batch_x = np.random.rand(10, 10000)
            batch_y = np.random.rand(10, 1000)
            sess.run([optimizer, cost], feed_dict={x: batch_x, y: batch_y})

    print "finished doing stuff with tensorflow!"


if __name__ == "__main__":

    # option 1: execute code with extra process
    p = multiprocessing.Process(target=run_tensorflow)
    p.start()
    p.join()

    # wait until user presses enter key
    raw_input()

    # option 2: just execute the function
    run_tensorflow()

    # wait until user presses enter key
    raw_input()

因此,如果您在创建的进程中调用函数 run_tensorflow() 并关闭该进程(选项 1),则内存将被释放.如果您只运行 run_tensorflow()(选项 2),则在函数调用后不会释放内存.

So if you would call the function run_tensorflow() within a process you created and shut the process down (option 1), the memory is freed. If you just run run_tensorflow() (option 2) the memory is not freed after the function call.

这篇关于模型执行后清除 Tensorflow GPU 内存的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆