具有预先训练的卷积基数的keras模型中损失函数的奇异行为 [英] Strange behaviour of the loss function in keras model, with pretrained convolutional base

查看:120
本文介绍了具有预先训练的卷积基数的keras模型中损失函数的奇异行为的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在Keras中创建一个模型,以便根据图片进行数值预测.我的模型具有 densenet121 卷积基础,顶部还有几个附加层.除最后两个之外的所有层均设置为layer.trainable = False.我的损失是均方误差,因为这是一项回归任务.在训练期间,我得到了loss: ~3,而对同一批数据的评估给出了loss: ~30:

I'm trying to create a model in Keras to make numerical predictions from the pictures. My model has densenet121 convolutional base, with couple of additional layers on top. All layers except for the two last ones are set to layer.trainable = False. My loss is mean squared error, since it's a regression task. During training I get loss: ~3, while evaluation on the very same batch of the data gives loss: ~30:

model.fit(x=dat[0],y=dat[1],batch_size=32)

Epoch 1/1 32/32 [=============================]-0s 11ms/step- 损失: 2.5571

Epoch 1/1 32/32 [==============================] - 0s 11ms/step - loss: 2.5571

model.evaluate(x=dat[0],y=dat[1])

32/32 [==============================]-2s 59ms/step 29.276123046875

32/32 [==============================] - 2s 59ms/step 29.276123046875

在培训和评估期间,我提供了完全相同的32张图片.我还使用来自y_pred=model.predict(dat[0])的预测值计算了损失,然后使用numpy构造了均方误差.结果与我从评估中得到的结果相同(即29.276123 ...).

I feed exactly the same 32 pictures during training and evaluation. And I also calculated loss using predicted values from y_pred=model.predict(dat[0]) and then constructed mean squared error using numpy. The result was the same as what I've got from evaluation (i.e. 29.276123...).

有人建议,此行为可能是由于卷积基础中的BatchNormalization层引起的( github上的讨论).当然,我模型中的所有BatchNormalization层也都设置为layer.trainable=False.也许有人遇到了这个问题并想出了解决办法?

There was suggestion that this behavior might be due to BatchNormalization layers in convolutional base (discussion on github). Of course, all BatchNormalization layers in my model have been set to layer.trainable=False as well. Maybe somebody has encountered this problem and figured out the solution?

推荐答案

好像我找到了解决方案.正如我所建议的,问题出在BatchNormalization层上.他们使树状事物变得1)减去均值并通过std进行归一化2)使用移动平均值收集均值和std的统计数据3)训练两个附加参数(每个节点两个)当一组可训练为False时,这两个参数将冻结,并且layer也将停止收集有关mean和std的统计信息.但是看来该图层仍在使用训练批次的训练时间内进行归一化.这很可能是喀拉拉邦的一个臭虫,或者出于某些原因,他们故意这样做了.结果,即使将可训练属性设置为False,训练时间期间正向传播的计算与预测时间也不同.

Looks like I found the solution. As I have suggested the problem is with BatchNormalization layers. They make tree things 1) subtract mean and normalize by std 2)collect statistics on mean and std using running average 3) train two additional parameters (two per node). When one sets trainable to False, these two parameters freeze and layer also stops collecting statistic on mean and std. But it looks like the layer still performs normalization during training time using the training batch. Most likely it's a bug in keras or maybe they did it on purpose for some reason. As a result the calculations on forward propagation during training time are different as compared with prediction time even though the trainable atribute is set to False.

我可以想到两种可能的解决方案:

There are two possible solutions i can think of:

  1. 要将所有BatchNormalization图层设置为可训练.在这种情况下,这些层将从您的数据集中收集统计信息,而不是使用经过预先训练的数据(可能会有很大不同!).在这种情况下,您将在训练期间将所有BatchNorm图层调整为自定义数据集.
  2. 将模型分为两个部分model=model_base+model_top.之后,使用model_base通过model_base.predict()提取特征,然后将这些特征输入model_top并仅训练model_top.
  1. To set all BatchNormalization layers to trainable. In this case these layers will collect statistics from your dataset instead of using pretrained one (which can be significantly different!). In this case you will adjust all the BatchNorm layers to your custom dataset during the training.
  2. Split the model in two parts model=model_base+model_top. After that, use model_base to extract features by model_base.predict() and then feed these features into model_top and train only the model_top.

我刚刚尝试了第一个解决方案,它似乎正在工作:

I've just tried the first solution and it looks like it's working:

model.fit(x=dat[0],y=dat[1],batch_size=32)

Epoch 1/1
32/32 [==============================] - 1s 28ms/step - loss: **3.1053**

model.evaluate(x=dat[0],y=dat[1])

32/32 [==============================] - 0s 10ms/step
**2.487905502319336**

这是在经过一些培训之后-需要等待,直到收集到有关均值和标准差的统计数据为止.

This was after some training - one need to wait till enough statistics on mean and std are collected.

第二种解决方案我还没有尝试过,但是我很确定它会起作用,因为在训练和预测过程中的前向传播将是相同的.

Second solution i haven't tried yet, but i'm pretty sure it's gonna work since forward propagation during training and prediction will be the same.

更新.我找到了一篇很棒的博客文章,其中详细讨论了此问题.在此处 进行检查

Update. I found a great blog post where this issue has been discussed in all the details. Check it out here

这篇关于具有预先训练的卷积基数的keras模型中损失函数的奇异行为的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆