分配张量时OOM [英] OOM when allocating tensor

查看：15 发布时间：2021/12/27 17:09:41 python tensorflow neural-network deep-learning conv-neural-network

本文介绍了分配张量时OOM的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

如何解决ResourceExhaustedError: OOM分配张量的问题?

How do I solve the problem of ResourceExhaustedError: OOM when allocating tensor？

ResourceExhaustedError(回溯见上文):分配时OOM形状为[10000,32,28,28]

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[10000,32,28,28]

我包含了几乎所有的代码

I included nearly all the code

learning_rate = 0.0001
epochs = 10
batch_size = 50

# declare the training data placeholders
# input x - for 28 x 28 pixels = 784 - this is the flattened image data that is drawn from
# mnist.train.nextbatch()
x = tf.placeholder(tf.float32, [None, 784])
# dynamically reshape the input
x_shaped = tf.reshape(x, [-1, 28, 28, 1])
# now declare the output data placeholder - 10 digits
y = tf.placeholder(tf.float32, [None, 10])
def create_new_conv_layer(input_data, num_input_channels, num_filters, filter_shape, pool_shape, name):
    # setup the filter input shape for tf.nn.conv_2d
    conv_filt_shape = [filter_shape[0], filter_shape[1], num_input_channels,
                      num_filters]

    # initialise weights and bias for the filter
    weights = tf.Variable(tf.truncated_normal(conv_filt_shape, stddev=0.03),
                                      name=name+'_W')
    bias = tf.Variable(tf.truncated_normal([num_filters]), name=name+'_b')

    # setup the convolutional layer operation
    out_layer = tf.nn.conv2d(input_data, weights, [1, 1, 1, 1], padding='SAME')

    # add the bias
    out_layer += bias

    # apply a ReLU non-linear activation
    out_layer = tf.nn.relu(out_layer)

    # now perform max pooling
    ksize = [1, 2, 2, 1]
    strides = [1, 2, 2, 1]
    out_layer = tf.nn.max_pool(out_layer, ksize=ksize, strides=strides,
                               padding='SAME')

    return out_layer
# create some convolutional layers
layer1 = create_new_conv_layer(x_shaped, 1, 32, [5, 5], [2, 2], name='layer1')
layer2 = create_new_conv_layer(layer1, 32, 64, [5, 5], [2, 2], name='layer2')

flattened = tf.reshape(layer2, [-1, 7 * 7 * 64])

# setup some weights and bias values for this layer, then activate with ReLU
wd1 = tf.Variable(tf.truncated_normal([7 * 7 * 64, 1000], stddev=0.03), name='wd1')
bd1 = tf.Variable(tf.truncated_normal([1000], stddev=0.01), name='bd1')
dense_layer1 = tf.matmul(flattened, wd1) + bd1
dense_layer1 = tf.nn.relu(dense_layer1)

# another layer with softmax activations
wd2 = tf.Variable(tf.truncated_normal([1000, 10], stddev=0.03), name='wd2')
bd2 = tf.Variable(tf.truncated_normal([10], stddev=0.01), name='bd2')
dense_layer2 = tf.matmul(dense_layer1, wd2) + bd2
y_ = tf.nn.softmax(dense_layer2)
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=dense_layer2, labels=y))


# add an optimiser
optimiser = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cross_entropy)

# define an accuracy assessment operation
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

# setup the initialisation operator
init_op = tf.global_variables_initializer() 



 with tf.Session() as sess:
            # initialise the variables
            sess.run(init_op)
            total_batch = int(len(mnist.train.labels) / batch_size)
            for epoch in range(epochs):
                avg_cost = 0
                for i in range(total_batch):
                    batch_x, batch_y = mnist.train.next_batch(batch_size=batch_size)
                    _, c = sess.run([optimiser, cross_entropy], feed_dict={x: 
         batch_x, 
            y: batch_y})
                    avg_cost += c / total_batch
                test_acc = sess.run(accuracy,feed_dict={x: mnist.test.images, y: 
            mnist.test.labels})
                print("Epoch:", (epoch + 1), "cost =", "{:.3f}".format(avg_cost), "  
            test accuracy: {:.3f}".format(test_acc))

            print("
Training complete!")
            print(sess.run(accuracy, feed_dict={x: mnist.test.images, y: 
            mnist.test.labels}))

错误中引用的那些行是:create_new_conv_layer - 函数

and those lines referenced in the error are : create_new_conv_layer - function

sess.run .. 在训练循环中

sess.run .. in the training loop

下面列出了我从调试器输出中复制的更多错误(有更多行，但我认为这些是主要的，而其他的则是由此引起的..)

More errors I copied from the debuggers output are listed is below (there were more lines but i think these ones are main one and the others are caused by this..)

tensorflow.python.framework.errors_impl.ResourceExhaustedError:在分配形状为[10000,32,28,28] [[Node: Conv2D =Conv2D[T=DT_FLOAT, data_format="NHWC", padding="SAME", strides=[1, 1,1, 1], use_cudnn_on_gpu=true,_device="/job:localhost/replica:0/task:0/gpu:0"](Reshape, layer1_W/read)]]

tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[10000,32,28,28] [[Node: Conv2D = Conv2D[T=DT_FLOAT, data_format="NHWC", padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/gpu:0"](Reshape, layer1_W/read)]]

我第二次运行它时出现以下错误我同时拥有 cpu 和 GPU，如下面的输出所示，我可以理解与 cpu 问题相关的一些错误可能是因为我的 tensorflow 没有被编译为使用那些功能，我在 Windows 10 上安装了 cuda 8 和 cudnn 6、python 3.5、tensorflow 1.3.0.

The second time i run it is issued the following error I have both cpu and GPU as can be seen in the output below , I can understand some of the errors related to cpu issues might be becuase my tensorflow wasnt compiled to use those features , I installed cuda 8 and cudnn 6 , python 3.5 , tensorflow 1.3.0 on windows 10.

2017-10-03 03:53:58.944371:WC: f_jenkinshomeworkspace el-winMwindows-gpuPY35 ensorflowcoreplatformcpu_feature_guard.cc:45]TensorFlow 库没有被编译为使用 AVX 指令，但是这些在您的机器上可用，可以加速 CPU计算.2017-10-03 03:53:58.945563:WC: f_jenkinshomeworkspace el-winMwindows-gpuPY35 ensorflowcoreplatformcpu_feature_guard.cc:45]TensorFlow 库没有被编译为使用 AVX2 指令，但是这些在您的机器上可用，可以加速 CPU计算.2017-10-03 03:53:59.230761:我C: f_jenkinshomeworkspace el-winMwindows-gpuPY35 ensorflowcorecommon_runtimegpugpu_device.cc:955]找到具有以下属性的设备 0:名称:Quadro K620 主要:5 次要:0 memoryClockRate (GHz) 1.124 pciBusID 0000:01:00.0 总内存:2.00GiB 可用内存:1.66GiB2017-10-03 03:53:59.231109:我C: f_jenkinshomeworkspace el-winMwindows-gpuPY35 ensorflowcorecommon_runtimegpugpu_device.cc:976]DMA:0 2017-10-03 03:53:59.231229:我C: f_jenkinshomeworkspace el-winMwindows-gpuPY35 ensorflowcorecommon_runtimegpugpu_device.cc:986]0: 是 2017-10-03 03:53:59.231363: 我C: f_jenkinshomeworkspace el-winMwindows-gpuPY35 ensorflowcorecommon_runtimegpugpu_device.cc:1045]创建 TensorFlow 设备 (/gpu:0) ->(设备:0，名称:Quadro K620，pci 总线 ID:0000:01:00.0) 2017-10-03 03:54:01.511141: EC: f_jenkinshomeworkspace el-winMwindows-gpuPY35 ensorflowstream_executorcudacuda_dnn.cc:371]无法创建 cudnn 句柄:CUDNN_STATUS_NOT_INITIALIZED 2017-10-03 03:54:01.511372: EC: f_jenkinshomeworkspace el-winMwindows-gpuPY35 ensorflowstream_executorcudacuda_dnn.cc:375]错误检索驱动程序版本:未实现:内核报告驱动程序版本未在 Windows 上实现 2017-10-0303:54:01.511862:EC: f_jenkinshomeworkspace el-winMwindows-gpuPY35 ensorflowstream_executorcudacuda_dnn.cc:338]无法销毁 cudnn 句柄:CUDNN_STATUS_BAD_PARAM 2017-10-03 03:54:01.512074: FC: f_jenkinshomeworkspace el-winMwindows-gpuPY35 ensorflowcorekernelsconv_ops.cc:672]检查失败:stream->parent()->GetConvolveAlgorithms(conv_parameters.ShouldIncludeWinogradNonfusedAlgo(), &algorithms)

2017-10-03 03:53:58.944371: W C: f_jenkinshomeworkspace el-winMwindows-gpuPY35 ensorflowcoreplatformcpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations. 2017-10-03 03:53:58.945563: W C: f_jenkinshomeworkspace el-winMwindows-gpuPY35 ensorflowcoreplatformcpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations. 2017-10-03 03:53:59.230761: I C: f_jenkinshomeworkspace el-winMwindows-gpuPY35 ensorflowcorecommon_runtimegpugpu_device.cc:955] Found device 0 with properties: name: Quadro K620 major: 5 minor: 0 memoryClockRate (GHz) 1.124 pciBusID 0000:01:00.0 Total memory: 2.00GiB Free memory: 1.66GiB 2017-10-03 03:53:59.231109: I C: f_jenkinshomeworkspace el-winMwindows-gpuPY35 ensorflowcorecommon_runtimegpugpu_device.cc:976] DMA: 0 2017-10-03 03:53:59.231229: I C: f_jenkinshomeworkspace el-winMwindows-gpuPY35 ensorflowcorecommon_runtimegpugpu_device.cc:986] 0: Y 2017-10-03 03:53:59.231363: I C: f_jenkinshomeworkspace el-winMwindows-gpuPY35 ensorflowcorecommon_runtimegpugpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Quadro K620, pci bus id: 0000:01:00.0) 2017-10-03 03:54:01.511141: E C: f_jenkinshomeworkspace el-winMwindows-gpuPY35 ensorflowstream_executorcudacuda_dnn.cc:371] could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED 2017-10-03 03:54:01.511372: E C: f_jenkinshomeworkspace el-winMwindows-gpuPY35 ensorflowstream_executorcudacuda_dnn.cc:375] error retrieving driver version: Unimplemented: kernel reported driver version not implemented on Windows 2017-10-03 03:54:01.511862: E C: f_jenkinshomeworkspace el-winMwindows-gpuPY35 ensorflowstream_executorcudacuda_dnn.cc:338] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM 2017-10-03 03:54:01.512074: F C: f_jenkinshomeworkspace el-winMwindows-gpuPY35 ensorflowcorekernelsconv_ops.cc:672] Check failed: stream->parent()->GetConvolveAlgorithms( conv_parameters.ShouldIncludeWinogradNonfusedAlgo(), &algorithms)

分配张量时OOM [英] OOM when allocating tensor

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

分配张量时OOM [英] OOM when allocating tensor

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭