分配张量时的OOM [英] OOM when allocating tensor

查看:123
本文介绍了分配张量时的OOM的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

分配张量时如何解决ResourceExhaustedError:OOM问题?

How do I solve the problem of ResourceExhaustedError: OOM when allocating tensor?

ResourceExhaustedError(请参阅上面的回溯):分配时为OOM 形状为张量的张量[10000,32,28,28]

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[10000,32,28,28]

我包含了几乎所有代码

learning_rate = 0.0001
epochs = 10
batch_size = 50

# declare the training data placeholders
# input x - for 28 x 28 pixels = 784 - this is the flattened image data that is drawn from
# mnist.train.nextbatch()
x = tf.placeholder(tf.float32, [None, 784])
# dynamically reshape the input
x_shaped = tf.reshape(x, [-1, 28, 28, 1])
# now declare the output data placeholder - 10 digits
y = tf.placeholder(tf.float32, [None, 10])
def create_new_conv_layer(input_data, num_input_channels, num_filters, filter_shape, pool_shape, name):
    # setup the filter input shape for tf.nn.conv_2d
    conv_filt_shape = [filter_shape[0], filter_shape[1], num_input_channels,
                      num_filters]

    # initialise weights and bias for the filter
    weights = tf.Variable(tf.truncated_normal(conv_filt_shape, stddev=0.03),
                                      name=name+'_W')
    bias = tf.Variable(tf.truncated_normal([num_filters]), name=name+'_b')

    # setup the convolutional layer operation
    out_layer = tf.nn.conv2d(input_data, weights, [1, 1, 1, 1], padding='SAME')

    # add the bias
    out_layer += bias

    # apply a ReLU non-linear activation
    out_layer = tf.nn.relu(out_layer)

    # now perform max pooling
    ksize = [1, 2, 2, 1]
    strides = [1, 2, 2, 1]
    out_layer = tf.nn.max_pool(out_layer, ksize=ksize, strides=strides,
                               padding='SAME')

    return out_layer
# create some convolutional layers
layer1 = create_new_conv_layer(x_shaped, 1, 32, [5, 5], [2, 2], name='layer1')
layer2 = create_new_conv_layer(layer1, 32, 64, [5, 5], [2, 2], name='layer2')

flattened = tf.reshape(layer2, [-1, 7 * 7 * 64])

# setup some weights and bias values for this layer, then activate with ReLU
wd1 = tf.Variable(tf.truncated_normal([7 * 7 * 64, 1000], stddev=0.03), name='wd1')
bd1 = tf.Variable(tf.truncated_normal([1000], stddev=0.01), name='bd1')
dense_layer1 = tf.matmul(flattened, wd1) + bd1
dense_layer1 = tf.nn.relu(dense_layer1)

# another layer with softmax activations
wd2 = tf.Variable(tf.truncated_normal([1000, 10], stddev=0.03), name='wd2')
bd2 = tf.Variable(tf.truncated_normal([10], stddev=0.01), name='bd2')
dense_layer2 = tf.matmul(dense_layer1, wd2) + bd2
y_ = tf.nn.softmax(dense_layer2)
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=dense_layer2, labels=y))


# add an optimiser
optimiser = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cross_entropy)

# define an accuracy assessment operation
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

# setup the initialisation operator
init_op = tf.global_variables_initializer() 



 with tf.Session() as sess:
            # initialise the variables
            sess.run(init_op)
            total_batch = int(len(mnist.train.labels) / batch_size)
            for epoch in range(epochs):
                avg_cost = 0
                for i in range(total_batch):
                    batch_x, batch_y = mnist.train.next_batch(batch_size=batch_size)
                    _, c = sess.run([optimiser, cross_entropy], feed_dict={x: 
         batch_x, 
            y: batch_y})
                    avg_cost += c / total_batch
                test_acc = sess.run(accuracy,feed_dict={x: mnist.test.images, y: 
            mnist.test.labels})
                print("Epoch:", (epoch + 1), "cost =", "{:.3f}".format(avg_cost), "  
            test accuracy: {:.3f}".format(test_acc))

            print("\nTraining complete!")
            print(sess.run(accuracy, feed_dict={x: mnist.test.images, y: 
            mnist.test.labels}))

和错误中引用的那些行是: create_new_conv_layer - function

and those lines referenced in the error are : create_new_conv_layer - function

sess.run ..在训练循环中

sess.run .. in the training loop

下面列出了我从调试器输出中复制的更多错误(有更多行,但我认为这些是主要的,其他是由此引起的.)

More errors I copied from the debuggers output are listed is below (there were more lines but i think these ones are main one and the others are caused by this..)

tensorflow.python.framework.errors_impl.ResourceExhaustedError:分配带有shape [10000,32,28,28]的张量时为OOM [[Node:Conv2D = Conv2D [T = DT_FLOAT,data_format ="NHWC",padding ="SAME",步幅= [1,1, 1,1],use_cudnn_on_gpu = true, _device ="/job:localhost/replica:0/task:0/gpu:0"](重塑,layer1_W/read)]]

tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[10000,32,28,28] [[Node: Conv2D = Conv2D[T=DT_FLOAT, data_format="NHWC", padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/gpu:0"](Reshape, layer1_W/read)]]

我第二次运行它时发出以下错误,如下面的输出所示,我同时拥有cpu和GPU,我可以理解一些与cpu问题有关的错误,可能是因为我的tensorflow没有编译为使用那些错误功能,我在Windows 10上安装了cuda 8和cudnn 6,python 3.5,tensorflow 1.3.0.

The second time i run it is issued the following error I have both cpu and GPU as can be seen in the output below , I can understand some of the errors related to cpu issues might be becuase my tensorflow wasnt compiled to use those features , I installed cuda 8 and cudnn 6 , python 3.5 , tensorflow 1.3.0 on windows 10.

2017-10-03 03:53:58.944371:W C:\ tf_jenkins \ home \ workspace \ rel-win \ M \ windows-gpu \ PY \ 35 \ tensorflow \ core \ platform \ cpu_feature_guard.cc:45] TensorFlow库未编译为使用AVX指令,但 这些在您的计算机上可用,并且可以加速CPU 计算. 2017-10-03 03:53:58.945563:W C:\ tf_jenkins \ home \ workspace \ rel-win \ M \ windows-gpu \ PY \ 35 \ tensorflow \ core \ platform \ cpu_feature_guard.cc:45] TensorFlow库未编译为使用AVX2指令,但 这些在您的计算机上可用,并且可以加速CPU 计算. 2017-10-03 03:53:59.230761:我 C:\ tf_jenkins \ home \ workspace \ rel-win \ M \ windows-gpu \ PY \ 35 \ tensorflow \ core \ common_runtime \ gpu \ gpu_device.cc:955] 找到具有属性的设备0: 名称:Quadro K620主要:5个次要:0 memoryClockRate(GHz)1.124 pciBusID 0000:01:00.0总内存:2.00GiB可用内存:1.66GiB 2017-10-03 03:53:59.231109:我 C:\ tf_jenkins \ home \ workspace \ rel-win \ M \ windows-gpu \ PY \ 35 \ tensorflow \ core \ common_runtime \ gpu \ gpu_device.cc:976] DMA:0 2017-10-03 03:53:59.231229:I C:\ tf_jenkins \ home \ workspace \ rel-win \ M \ windows-gpu \ PY \ 35 \ tensorflow \ core \ common_runtime \ gpu \ gpu_device.cc:986] 0:是2017-10-03 03:53:59.231363:我 C:\ tf_jenkins \ home \ workspace \ rel-win \ M \ windows-gpu \ PY \ 35 \ tensorflow \ core \ common_runtime \ gpu \ gpu_device.cc:1045] 创建TensorFlow设备(/gpu:0)->(设备:0,名称:Quadro K620, pci总线ID:0000:01:00.0)2017-10-03 03:54:01.511141:E C:\ tf_jenkins \ home \ workspace \ rel-win \ M \ windows-gpu \ PY \ 35 \ tensorflow \ stream_executor \ cuda \ cuda_dnn.cc:371] 无法创建Cudnn句柄:CUDNN_STATUS_NOT_INITIALIZED 2017-10-03 03:54:01.511372:E C:\ tf_jenkins \ home \ workspace \ rel-win \ M \ windows-gpu \ PY \ 35 \ tensorflow \ stream_executor \ cuda \ cuda_dnn.cc:375] 错误获取驱动程序版本:未实现:内核报告的驱动程序版本未在Windows上实现 2017-10-03 03:54:01.511862:E C:\ tf_jenkins \ home \ workspace \ rel-win \ M \ windows-gpu \ PY \ 35 \ tensorflow \ stream_executor \ cuda \ cuda_dnn.cc:338] 无法破坏cudnn句柄:CUDNN_STATUS_BAD_PARAM 2017-10-03 03:54:01.512074:F C:\ tf_jenkins \ home \ workspace \ rel-win \ M \ windows-gpu \ PY \ 35 \ tensorflow \ core \ kernels \ conv_ops.cc:672] 检查失败:stream-> parent()-> GetConvolveAlgorithms( conv_parameters.ShouldIncludeWinogradNonfusedAlgo(),& algorithms)

2017-10-03 03:53:58.944371: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations. 2017-10-03 03:53:58.945563: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations. 2017-10-03 03:53:59.230761: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\common_runtime\gpu\gpu_device.cc:955] Found device 0 with properties: name: Quadro K620 major: 5 minor: 0 memoryClockRate (GHz) 1.124 pciBusID 0000:01:00.0 Total memory: 2.00GiB Free memory: 1.66GiB 2017-10-03 03:53:59.231109: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\common_runtime\gpu\gpu_device.cc:976] DMA: 0 2017-10-03 03:53:59.231229: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\common_runtime\gpu\gpu_device.cc:986] 0: Y 2017-10-03 03:53:59.231363: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\common_runtime\gpu\gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Quadro K620, pci bus id: 0000:01:00.0) 2017-10-03 03:54:01.511141: E C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\stream_executor\cuda\cuda_dnn.cc:371] could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED 2017-10-03 03:54:01.511372: E C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\stream_executor\cuda\cuda_dnn.cc:375] error retrieving driver version: Unimplemented: kernel reported driver version not implemented on Windows 2017-10-03 03:54:01.511862: E C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\stream_executor\cuda\cuda_dnn.cc:338] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM 2017-10-03 03:54:01.512074: F C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\kernels\conv_ops.cc:672] Check failed: stream->parent()->GetConvolveAlgorithms( conv_parameters.ShouldIncludeWinogradNonfusedAlgo(), &algorithms)

推荐答案

该过程因内存不足(OOM)而失败,因为您一次推送了整个测试集以进行评估(请参阅

The process failed with out-of-memory (OOM) because you pushed the whole test set for evaluation at once (see this question). It's easy to see that 10000 * 32 * 28 * 28 * 4 is almost 1Gb, while your GPU has only 1.66Gb available in total and most of it is already taken by the network itself.

解决方案是不仅为训练提供神经网络批次,还为测试提供神经网络批次.结果精度将是所有批次的平均值.而且,您不需要在每个时期之后都这样做:您真的对所有中间网络的测试结果感兴趣吗?

The solution is to feed the neural network batches not only for training, but for testing as well. The result accuracy is going to be an average across all batches. Moreover, you don't need to do this after each epoch: are you really interested in test results of all intermediate networks?

您的第二条错误消息很可能是先前失败的结果,因为CUDNN驱动程序似乎不再起作用.我建议重新启动计算机.

Your second error message is very likely a result of the previous failures, because CUDNN driver doesn't seem to work anymore. I'd suggest to restart your machine.

这篇关于分配张量时的OOM的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆