TensorFlow和Keras的相同实现之间的行为不同 [英] Different behaviour between same implementations of TensorFlow and Keras

查看:314
本文介绍了TensorFlow和Keras的相同实现之间的行为不同的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的机器上有TensorFlow 1.9和Keras 2.0.8.当使用一些玩具数据训练神经网络时,TensorFlow和Keras之间的训练曲线非常不同,我不明白为什么.

I have TensorFlow 1.9 and Keras 2.0.8 on my machine. When training a neural network with some toy data, the resulting training curves are very different between TensorFlow and Keras, and I do not understand why.

对于Keras实施,网络学习得很好,并且损失继续减少,而对于TensorFlow实施,网络没有学到任何东西,并且损失也没有减少.我试图确保两个实现都使用相同的超参数. 为什么行为如此不同?

For the Keras implementation, the network learns well and the loss continues to decrease, whereas for the TensorFlow implementation, the network does not learn anything and the loss does not decrease. I have tried to ensure that both implementations use the same hyperparameters. Why is the behaviour so different?

网络本身有两个输入:和图像,以及一个向量.然后将它们通过它们自己的层,然后再进行连接.

The network itself has two inputs: and image, and a vector. These are then passed through their own layers, before being concatenated.

这是我的实现方式.

Tensorflow:

Tensorflow:

# Create the placeholders
input1 = tf.placeholder("float", [None, 64, 64, 3])
input2 = tf.placeholder("float", [None, 4])
label = tf.placeholder("float", [None, 4])

# Build the TensorFlow network
# Input 1
x1 = tf.layers.conv2d(inputs=input1, filters=30, kernel_size=[5, 5], strides=(2, 2), padding='valid', activation=tf.nn.relu)
x1 = tf.layers.conv2d(inputs=x1, filters=30, kernel_size=[5, 5], strides=(2, 2), padding='valid', activation=tf.nn.relu)
x1 = tf.layers.flatten(x1)
x1 = tf.layers.dense(inputs=x1, units=30)
# Input 2
x2 = tf.layers.dense(inputs=input2, units=30, activation=tf.nn.relu)
# Output
x3 = tf.concat(values=[x1, x2], axis=1)
x3 = tf.layers.dense(inputs=x3, units=30)
prediction = tf.layers.dense(inputs=x3, units=4)

# Define the optimisation
loss = tf.reduce_mean(tf.square(label - prediction))
train_op = tf.train.AdamOptimizer(learning_rate=0.001).minimize(loss)

# Train the model
sess = tf.Session()
sess.run(tf.global_variables_initializer())
training_feed = {input1: training_input1_data, input2: training_input2_data, label: training_label_data}
validation_feed = {input1: validation_input1_data, input2: validation_input2_data, label: validation_label_data}
for epoch_num in range(30):
    train_loss, _ = sess.run([loss, train_op], feed_dict=training_feed)
    val_loss = sess.run(loss, feed_dict=validation_feed)

凯拉斯:

# Build the keras network
# Input 1
input1 = Input(shape=(64, 64, 3), name='input1')
x1 = Conv2D(filters=30, kernel_size=5, strides=(2, 2), padding='valid', activation='relu')(input1)
x1 = Conv2D(filters=30, kernel_size=5, strides=(2, 2), padding='valid', activation='relu')(x1)
x1 = Flatten()(x1)
x1 = Dense(units=30, activation='relu')(x1)
# Input 2
input2 = Input(shape=(4,), name='input2')
x2 = Dense(units=30, activation='relu')(input2)
# Output
x3 = keras.layers.concatenate([x1, x2])
x3 = Dense(units=30, activation='relu')(x3)
prediction = Dense(units=4, activation='linear', name='output')(x3)

# Define the optimisation
model = Model(inputs=[input1, input2], outputs=[prediction])
adam = optimizers.Adam(lr=0.001)
model.compile(optimizer=adam, loss='mse')

# Train the model
training_inputs = {'input1': training_input1_data, 'input2': training_input2_data}
training_labels = {'output': training_label_data}
validation_inputs = {'input1': validation_images, 'input2': validation_state_diffs}
validation_labels = {'output': validation_label_data}
callback = PlotCallback()
model.fit(x=training_inputs, y=training_labels, validation_data=(validation_inputs, validation_labels), batch_size=len(training_label_data[0]), epochs=30)

这是训练曲线(每个实现两次运行).

And here are the training curves (two runs for each implementation).

Tensorflow:

Tensorflow:

凯拉斯:

推荐答案

在仔细检查了您的实现之后,我发现除批大小外,所有超参数都匹配.我不同意@Ultraviolet的回答,因为tf.layers.conv2d的默认kernel_initializer也是Xavier(请参见

After carefully examining your implementations, I observed that all the hyperparameters match except for the batch size. I don't agree with the answer from @Ultraviolet, because the default kernel_initializer of tf.layers.conv2d is also Xavier (see the TF implementation of conv2d).

学习曲线不匹配有以下两个原因:

The learning curves don't match for the following two reasons:

  1. 与TF实现(版本1)相比,Keras实现(版本2)的参数收到的更新更多.在版本1中,您将在每个时期将整个数据集同时馈入网络.这仅导致30个adam更新.相比之下,版本2使用batch_size=4执行30 * ceil(len(training_label_data)/batch_size)亚当更新.

版本2的更新比版本1的噪声更大,因为梯度是在更少的样本上平均的.

The updates of version 2 are noisier than those of version 1, because the gradients are averaged over less samples.

这篇关于TensorFlow和Keras的相同实现之间的行为不同的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆