在keras中对批处理归一化进行微调 [英] fine-tune with batch normalization in keras

查看:567
本文介绍了在keras中对批处理归一化进行微调的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经成功地训练了超过100000个样本的模型,该模型在训练集和测试集上均表现出色.然后,我尝试在一个特定样本(100000个样本之一)中对其进行微调,并将训练后的权重用作初始化.

I have trained a model successfully over 100000 samples, which performs well both in train set and test set. Then, I tried to fine-tune it over one particular sample (one of the 100000 samples) and use the trained weights as the initialization.

但是结果有点奇怪,我相信这是由批处理规范化层引起的.具体来说,我的代码可以列出如下:

But the result is a little strange and I believe it is caused by the batch normalization layer. Specifically, my code can be listed as follows:

model = mymodel()
model.load_weights('./pre_trained.h5') #start from history
rate = model.evaluate(x, y)
print(rate)
checkpoint = tf.keras.callbacks.ModelCheckpoint('./trained.h5', monitor='loss',
        verbose=0, save_best_only=True, mode='min',save_weights_only=True)
model.fit(x, y,validation_data=[x, y], epochs=5, verbose=2, callbacks=[checkpoint])

model.load_weights('./trained.h5') 比率= model.evaluate(x,y) 打印(费率)

model.load_weights('./trained.h5') rate = model.evaluate(x, y) print(rate)

mymodel是一个自定义函数,用于生成我的模型,由Dense和Batch规范化组成. x,y是一个特定样本的输入和标签.我想进一步优化样本的损失.但是,结果很奇怪,因为:

mymodel is a self-define function to generate my model, consists of Dense and Batch normalization. x,y is the input and label of one particular sample. I want to further optimize the loss of the sample. However, the results is strange as:

 1/1 [==============================] - 0s 209ms/step
-6.087581634521484
Train on 1 samples, validate on 1 samples
Epoch 1/200
 - 1s - loss: -2.7749e-01 - val_loss: -6.0876e+00
Epoch 2/200
 - 0s - loss: -2.8791e-01 - val_loss: -6.0876e+00
Epoch 3/200
 - 0s - loss: -3.0012e-01 - val_loss: -6.0876e+00
Epoch 4/200
 - 0s - loss: -3.1325e-01 - val_loss: -6.0876e+00

如图所示,首先,model.evaluate运行良好,因为损失结果(-6.087581634521484)接近已加载训练模型的性能.但是,在训练集上的损失(实际上与model.fit()中的验证集相同)很奇怪. val_loss很正常,类似于第一行中的model.evaluate结果.所以我很困惑,为什么火车损失和推理损失之间仍然有很大的差异(火车损失更糟),因为火车样本和验证样本是相同的,我认为结果也应该相同,或者至少非常接近.我怀疑问题是由BN层引起的,原因是训练与推理之间的差异很大.但是,我已经在加载预训练的权重之后并且在model.fit之前设置了BN层的trainable = False,但是问题没有解决.

As it shown, first the model.evaluate works well as the loss result ( -6.087581634521484) is close to the performance of loaded trained model. But the loss over the train set (actually same as the validation set in model.fit()) is strange. The val_loss is normal, similar to the results of model.evaluate in the first line. So I'm really puzzled that why still a large difference between the train loss and the inference loss (the train loss is worse), as the train sample and the validation sample is the same one, I think the result should also be the same, or at least very close.I suspect the problem is caused by the BN layer, due to the large difference between train and inference. However, I have already set the trainable = False of the BN layer after loading the pre-trained weights and before the model.fit, but the problem is not solved.

out = tf.keras.layers.BatchNormalization(trainable=False)(out)

我仍然怀疑BN层,想知道设置trainable=False是否足以保持BN的参数不变.

I still doubt the BN layer, and wonder if set trainable=False is enough to keep the parameters of BN same.

有人可以给我一些建议吗?非常感谢您的提前帮助. 对不起,我的英语,但是我尽力解释了我的问题.

Can anyone give me some advise? Thanks a lot for your help in advance. Sorry for my English, but I tried my best to explain my problem.

推荐答案

在pytorch中,我也想分享一下类似的发现. 首先,您的keras版本是什么?因为在2.1.3之后,设置BN层trainable = False将使BN在推理模式下的行为完全相同,这意味着它不会将输入归一化为0均值1方差(类似于训练模式),而是归一化为均值和方差.如果将学习阶段设置为1,则BN本质上成为实例范数,它忽略了运行均值和方差,只需归一化为0个均值和1个方差,这可能就是您想要的行为.

I had the similar finding in pytorch I would like to share. First of all, what is your keras version? Because after 2.1.3, set BN layer trainable=False will make BN behave exactly the same in inference mode, meaning that it will not normalize the input to 0 mean 1 variance(like in training mode), but to running mean and variance. If you set learning phase to 1, then BN essentially becomes instance norm, which ignores running mean and variance, just normalize to 0 mean and 1 variance, which might be your desired behavior.

keras发行说明的参考链接: https://github. com/keras-team/keras/releases/tag/2.1.3

Reference link of keras release note: https://github.com/keras-team/keras/releases/tag/2.1.3

BatchNormalization中的API更改可训练属性现在禁用了 批次统计信息的更新(即,如果可训练== False,则表示该层 现在将在推断模式下100%运行.

API changes trainable attribute in BatchNormalization now disables the updates of the batch statistics (i.e. if trainable == False the layer will now run 100% in inference mode).

这篇关于在keras中对批处理归一化进行微调的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆