遇到 malloc “已释放对象的校验和不正确"TensorFlow 中的错误 [英] Encountering a malloc "incorrect checksum for freed object" error in TensorFlow

查看:13
本文介绍了遇到 malloc “已释放对象的校验和不正确"TensorFlow 中的错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在 TensorFlow 中运行一个 3d-conv 神经网络,它在我的具有 64GB RAM 的 Windows 计算机上运行良好.但是,当我切换到内存为 16GB 的 Macbook Pro 时,出现以下错误:

I'm running a 3d-conv neural network in TensorFlow and it worked fine on my windows computer with 64GB of RAM. But, when I switch over to my Macbook Pro w/ 16GB on RAM I get the following error:

Python(1292,0x700006cca000) malloc: \*** error for object 0x10113fe00: incorrect checksum for freed object - object was probably modified after being freed.
*** set a breakpoint in malloc_error_break to debug

我正在使用一个 ConvLayer 类:

I'm using a ConvLayer class as such:

class ConvLayer(object):

    def __init__(self, layer_input, in_channel, out_channel, weight_dims, conv_strides,
                pool_ksize, pool_strides, name_suffix):
        """
        :param input: Tensorflow variable that is the input to this layer
        :param depth: Depth to look at for each convolution
        :param in_channel: Number of channels going into the layer
        :param out_channel: Number of channels out
        :param window: A length-2 list of the window to look at
        :param strides: A length-2 list of the stride size
        :param name_suffix: Suffix to append to the variable names
        """
        self.input = layer_input
        self.W_conv = weight_variable(weight_dims + [in_channel, out_channel],
                                    name="W" + name_suffix)
        self.b_conv = bias_variable([out_channel])
        self.h_conv = tf.nn.relu(conv3d(self.input, self.W_conv, strides=conv_strides) + self.b_conv)
        self.h_pool = max_pool3d_2x2(self.h_conv, ksize=pool_ksize, strides=pool_strides)

全连接层定义为:

class FCLayer(object):

    def __init__(self, layer_input, weight_dimensions, name_suffix):
        self.input = layer_input
        self.W_fc = weight_variable(shape=weight_dimensions, name="W" + name_suffix)
        self.b_fc = bias_variable(shape=[weight_dimensions[1]])
        self.activation = tf.matmul(self.input, self.W_fc) + self.b_fc

那么,我的实际网络定义为:

Then, my actual network is defined as:

class NNetwork(object):

    def __init__(self, color):
        self.x = tf.placeholder(tf.float32, shape=[None, 8, 8, 4])
        self.y = tf.placeholder(tf.float32, shape=[None, 1])
        self.x_image = tf.reshape(self.x, [-1, 4, 8, 8, 1])

        self.layer1 = ConvLayer(layer_input=self.x_image, in_channel=1, out_channel=16,
                                weight_dims=[4, 4, 4], conv_strides=[1, 4, 2, 2, 1],
                                pool_ksize=[1, 1, 2, 2, 1], pool_strides=[1, 1, 2, 2, 1],
                                name_suffix="conv1_" + color)
        self.layer2 = ConvLayer(layer_input=self.layer1.h_pool, in_channel=16, out_channel=32,
                                weight_dims=[1, 2, 2], conv_strides=[1, 1, 2, 2, 1],
                                pool_ksize=[1, 1, 1, 1, 1], pool_strides=[1, 1, 1, 1, 1],
                                name_suffix="conv2_" + color)
        self.layer2flattened = tf.reshape(self.layer2.h_pool, [-1, 128])
        self.layer3 = FCLayer(self.layer2flattened, [128, 256], "_fc1_" + color)
        self.layer4 = FCLayer(tf.nn.relu(self.layer3.activation), [256, 1], "_fc2_" + color)

        self.y_hat = self.layer4.activation
        self.loss = tf.reduce_mean(tf.square(self.y_hat - self.y))
        self.optimizer = tf.train.AdamOptimizer(1e-4).minimize(self.loss)

我在使用 TensorFlow 时从未见过这样的错误,并且不知道如何尝试修复它.如果我使用我们的卷积层并将第一个输入到全连接层,它工作得很好,所以我怀疑它完全与内存有关,但在这种情况下错误应该是溢出错误 - 或者我会想.

I have never seen an error like this while using TensorFlow and am lost on how to try and fix it. If I take our a convolutional layer and feed the first one into the fully connected layer it works just fine, so I suspect it has to do with memory entirely, but the error should be an overflow error in that case - or so I would have thought.

如果有人想运行网络,这里有一些应该可以工作的示例代码......

If anyone wants to run the network, here is some sample code that should work...

with tf.Session() as sess:

    n_network = NNetwork("purple")
    init_op = tf.initialize_all_variables()
    sess.run(init_op)
    test_input = test_input = np.zeros(shape=[1,8,8,4])
    runNetwork = sess.run(n_network.y_hat, feed_dict={n_network.x: test_input})

任何帮助将不胜感激!

推荐答案

在更新所有内容并更改 Pycharm 上的解释器后再次尝试运行,并且成功.

Tried running again after updating everything and changing my interpreter on Pycharm and it worked.

结果我在不知不觉中运行了旧版本的 TensorFlow,因为我的解释器已更改为系统默认设置,而不是我的 Anaconda 环境.

Turned out I was running on an older version of TensorFlow unknowingly as my interpreter had been changed to system default instead of my Anaconda environment.

这篇关于遇到 malloc “已释放对象的校验和不正确"TensorFlow 中的错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆