微调时,为什么必须冻结“批处理规范化"层的所有内部状态 [英] Why it's necessary to frozen all inner state of a Batch Normalization layer when fine-tuning

查看:249
本文介绍了微调时,为什么必须冻结“批处理规范化"层的所有内部状态的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

以下内容来自Keras教程

The following content comes from Keras tutorial

此行为已在TensorFlow 2.0中引入,以使layer.trainable = False能够在convnet微调用例中产生最普遍预期的行为.

This behavior has been introduced in TensorFlow 2.0, in order to enable layer.trainable = False to produce the most commonly expected behavior in the convnet fine-tuning use case.

为什么在微调卷积神经网络时应该冻结层?是因为tensorflow keras中的某些机制还是因为批处理规范化算法?我自己进行了一次实验,发现如果可训练性未设置为假,则该模型往往会灾难性地忘记以前学到的东西,并在最初的几个时期返回非常大的损失.是什么原因呢?

Why we should freeze the layer when fine-tuning a convolutional neural network? Is it because some mechanisms in tensorflow keras or because of the algorithm of batch normalization? I run an experiment myself and I found that if trainable is not set to false the model tends to catastrophic forgetting what has been learned before and returns very large loss at first few epochs. What's the reason for that?

推荐答案

在培训过程中,不同的批次统计信息充当可提高泛化能力的正则化机制.在训练大量迭代时,这可以帮助最大程度地减少过度拟合.的确,由于批处理统计信息的变化较少,正则化使用非常大的批处理大小可能会损害泛化

During training, varying batch statistics act as a regularization mechanism that can improve ability to generalize. This can help to minimize overfitting when training for a high number of iterations. Indeed, using a very large batch size can harm generalization as there is less variation in batch statistics, decreasing regularization.

在新数据集上进行微调时,如果微调示例与原始训练数据集中的示例具有不同的特征,则批次统计数据可能会非常不同.因此,如果未冻结批量归一化,则网络将学习新的批量归一化参数(批量归一化论文中的gamma和beta ),这与其他网络参数在原始培训期间进行了优化.在微调期间,由于所需的训练时间或微调数据集的大小较小,通常不希望重新学习所有其他网络参数.冻结批处理规范化可以避免此问题.

When fine-tuning on a new dataset, batch statistics are likely to be very different if fine-tuning examples have different characteristics to examples in the original training dataset. Therefore, if batch normalization is not frozen, the network will learn new batch normalization parameters (gamma and beta in the batch normalization paper) that are different to what the other network paramaters have been optimised for during the original training. Relearning all the other network parameters is often undesirable during fine-tuning, either due to the required training time or small size of the fine-tuning dataset. Freezing batch normalization avoids this issue.

这篇关于微调时,为什么必须冻结“批处理规范化"层的所有内部状态的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆