遇到 malloc “已释放对象的校验和不正确"TensorFlow 中的错误 [英] Encountering a malloc "incorrect checksum for freed object" error in TensorFlow
问题描述
我正在 TensorFlow 中运行一个 3d-conv 神经网络,它在我的具有 64GB RAM 的 Windows 计算机上运行良好.但是,当我切换到内存为 16GB 的 Macbook Pro 时,出现以下错误:
I'm running a 3d-conv neural network in TensorFlow and it worked fine on my windows computer with 64GB of RAM. But, when I switch over to my Macbook Pro w/ 16GB on RAM I get the following error:
Python(1292,0x700006cca000) malloc: \*** error for object 0x10113fe00: incorrect checksum for freed object - object was probably modified after being freed.
*** set a breakpoint in malloc_error_break to debug
我正在使用一个 ConvLayer 类:
I'm using a ConvLayer class as such:
class ConvLayer(object):
def __init__(self, layer_input, in_channel, out_channel, weight_dims, conv_strides,
pool_ksize, pool_strides, name_suffix):
"""
:param input: Tensorflow variable that is the input to this layer
:param depth: Depth to look at for each convolution
:param in_channel: Number of channels going into the layer
:param out_channel: Number of channels out
:param window: A length-2 list of the window to look at
:param strides: A length-2 list of the stride size
:param name_suffix: Suffix to append to the variable names
"""
self.input = layer_input
self.W_conv = weight_variable(weight_dims + [in_channel, out_channel],
name="W" + name_suffix)
self.b_conv = bias_variable([out_channel])
self.h_conv = tf.nn.relu(conv3d(self.input, self.W_conv, strides=conv_strides) + self.b_conv)
self.h_pool = max_pool3d_2x2(self.h_conv, ksize=pool_ksize, strides=pool_strides)
全连接层定义为:
class FCLayer(object):
def __init__(self, layer_input, weight_dimensions, name_suffix):
self.input = layer_input
self.W_fc = weight_variable(shape=weight_dimensions, name="W" + name_suffix)
self.b_fc = bias_variable(shape=[weight_dimensions[1]])
self.activation = tf.matmul(self.input, self.W_fc) + self.b_fc
那么,我的实际网络定义为:
Then, my actual network is defined as:
class NNetwork(object):
def __init__(self, color):
self.x = tf.placeholder(tf.float32, shape=[None, 8, 8, 4])
self.y = tf.placeholder(tf.float32, shape=[None, 1])
self.x_image = tf.reshape(self.x, [-1, 4, 8, 8, 1])
self.layer1 = ConvLayer(layer_input=self.x_image, in_channel=1, out_channel=16,
weight_dims=[4, 4, 4], conv_strides=[1, 4, 2, 2, 1],
pool_ksize=[1, 1, 2, 2, 1], pool_strides=[1, 1, 2, 2, 1],
name_suffix="conv1_" + color)
self.layer2 = ConvLayer(layer_input=self.layer1.h_pool, in_channel=16, out_channel=32,
weight_dims=[1, 2, 2], conv_strides=[1, 1, 2, 2, 1],
pool_ksize=[1, 1, 1, 1, 1], pool_strides=[1, 1, 1, 1, 1],
name_suffix="conv2_" + color)
self.layer2flattened = tf.reshape(self.layer2.h_pool, [-1, 128])
self.layer3 = FCLayer(self.layer2flattened, [128, 256], "_fc1_" + color)
self.layer4 = FCLayer(tf.nn.relu(self.layer3.activation), [256, 1], "_fc2_" + color)
self.y_hat = self.layer4.activation
self.loss = tf.reduce_mean(tf.square(self.y_hat - self.y))
self.optimizer = tf.train.AdamOptimizer(1e-4).minimize(self.loss)
我在使用 TensorFlow 时从未见过这样的错误,并且不知道如何尝试修复它.如果我使用我们的卷积层并将第一个输入到全连接层,它工作得很好,所以我怀疑它完全与内存有关,但在这种情况下错误应该是溢出错误 - 或者我会想.
I have never seen an error like this while using TensorFlow and am lost on how to try and fix it. If I take our a convolutional layer and feed the first one into the fully connected layer it works just fine, so I suspect it has to do with memory entirely, but the error should be an overflow error in that case - or so I would have thought.
如果有人想运行网络,这里有一些应该可以工作的示例代码......
If anyone wants to run the network, here is some sample code that should work...
with tf.Session() as sess:
n_network = NNetwork("purple")
init_op = tf.initialize_all_variables()
sess.run(init_op)
test_input = test_input = np.zeros(shape=[1,8,8,4])
runNetwork = sess.run(n_network.y_hat, feed_dict={n_network.x: test_input})
任何帮助将不胜感激!
推荐答案
在更新所有内容并更改 Pycharm 上的解释器后再次尝试运行,并且成功.
Tried running again after updating everything and changing my interpreter on Pycharm and it worked.
结果我在不知不觉中运行了旧版本的 TensorFlow,因为我的解释器已更改为系统默认设置,而不是我的 Anaconda 环境.
Turned out I was running on an older version of TensorFlow unknowingly as my interpreter had been changed to system default instead of my Anaconda environment.
这篇关于遇到 malloc “已释放对象的校验和不正确"TensorFlow 中的错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!