Tensorflow:尝试分配3.90GiB的内存不足.呼叫者表明这不是失败 [英] Tensorflow: ran out of memory trying to allocate 3.90GiB. The caller indicates that this is not a failure

查看:1362
本文介绍了Tensorflow:尝试分配3.90GiB的内存不足.呼叫者表明这不是失败的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有一个我不明白的问题.

There is a question that I don't understand.

Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.90GiB. 
The caller indicates that this is not a failure, 
but may mean that there could be performance gains if more memory is available.

句子是什么意思?

我已经阅读了源代码.但是由于能力差,我听不懂.
GPU的内存大小为6GB,我使用tfprof分析得出的内存使用结果约为14GB. 这超出了GPU的内存大小. 这句话显示天气张量流分配了CPU的内存还是使用关于GPU的内存的良好算法?

I have read the source code. But I cann't understand because of my poor ability.
The memory size of GPU is 6GB, the result of memory use that I use tfprof analysis is about 14GB. That is beyond the memory size of GPU. The sentence is showing weather tensorflow allocate the memory of CPU or use the good algorithm about the use of memory of GPU?

The version of tensorflow that I use is 1.2.

GPU的信息如下:

  • 名称:GeForce GTX TITAN Z
  • 主要:3个次要:5个内存时钟频率(GHz)0.8755
  • 总内存:5.94GiB
  • 可用内存:5.87GiB

我的代码:

#!/usr/bin/python3.4

import tensorflow as tf
import tensorflow.examples.tutorials.mnist.input_data as input_data
import os
os.environ["CUDA_VISIBLE_DEVICES"] = '0' 


mnist = input_data.read_data_sets("MNIST_data", one_hot=True)
sess = tf.InteractiveSession(config=tf.ConfigProto(log_device_placement=True))
#sess = tf.InteractiveSession()
def weight_variable(shape):
    init = tf.truncated_normal(shape, stddev=0.1)
    return tf.Variable(init)

def bias_variable(shape):
    init = tf.constant(0.1, shape=shape)
    return tf.Variable(init)

def conv2d(x, W):
    return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')

def max_pool_2x2(x):
    return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')

x = tf.placeholder(tf.float32, [None, 784])
y_ = tf.placeholder(tf.float32, [None, 10])
x_image = tf.reshape(x, [-1, 28, 28, 1])


W_conv1 = weight_variable([5, 5, 1, 32])
b_conv1 = bias_variable([32])
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)


W_conv2 = weight_variable([5, 5, 32, 64])
b_conv2 = bias_variable([64])
h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)


W_f1 = weight_variable([7*7*64, 1024])
b_f1 = bias_variable([1024])
h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_f1) + b_f1)


keep_prob = tf.placeholder(tf.float32)
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

W_f2 = weight_variable([1024, 10])
b_f2 = bias_variable([10])
y_conv = tf.nn.softmax(tf.matmul(h_fc1_drop, W_f2) + b_f2)

cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y_conv), reduction_indices=[1]))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))


test_images = tf.placeholder(tf.float32, [None, 784])
test_labels = tf.placeholder(tf.float32, [None, 10])


tf.global_variables_initializer().run()

run_metadata = tf.RunMetadata()


for i in range(100):
    batch = mnist.train.next_batch(10000)
    if (i%10 == 0):  
        train_accurancy = accuracy.eval(feed_dict={x: batch[0], y_: batch[1], keep_prob : 1.0})
        print("step %d, traning accurancy %g" % (i, train_accurancy))
    sess.run(train_step, feed_dict={x: batch[0], y_: batch[1], keep_prob : 0.5}, options=tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE), run_metadata=run_metadata)

tf.contrib.tfprof.model_analyzer.print_model_analysis(
    tf.get_default_graph(),
    run_meta=run_metadata,
    tfprof_options=tf.contrib.tfprof.model_analyzer.PRINT_ALL_TIMING_MEMORY)

test_images = mnist.test.images[0:300, :]
test_labels = mnist.test.labels[0:300, :]
print("test accuracy %g" % accuracy.eval({x: test_images, y_: test_labels, keep_prob: 1.0}))

警告:

2017-08-10 21:37:44.589635: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.90GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
2017-08-10 21:37:46.208897: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.61GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.

tfprof的结果:

The result of tfprof:

==================Model Analysis Report======================
_TFProfRoot (0B/14854.97MB, 0us/7.00ms)

推荐答案

您正在使用GPU,并且您的batchSize为 1000 ,对于10个类来说就很多了!制作较小的批次大小(如10to20),并将范围扩大到10e4甚至10e3. 这个问题很好已知.如果绝对要使用10000作为批处理大小,请告诉tensorflow使用 CPU ,方法是:

you're using the GPU, and your batchSize is 1000 it's a lot for 10 classes! make a smaller batch size like 10to20 and augment the range to 10e4 or even 10e3. This problem is well known . Any if you definitely want to use 10000 as batch size, tell tensorflow to use the CPU using :

tf.device('/cpu:0')

这篇关于Tensorflow:尝试分配3.90GiB的内存不足.呼叫者表明这不是失败的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆