Keras Tensorfolow的BatchNormalization层中的attrainbuts“可训练"和"training"之间有什么区别? [英] What's the difference between attrubutes 'trainable' and 'training' in BatchNormalization layer in Keras Tensorfolow?

查看:114
本文介绍了Keras Tensorfolow的BatchNormalization层中的attrainbuts“可训练"和"training"之间有什么区别?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

根据tensorflow的官方文件:

关于在`BatchNormalization层上设置layer.trainable = False:
设置layer.trainable = False的含义是冻结该层,即其内部状态在训练期间不会更改:其可训练权重不会在fit()或train_on_batch()期间更新,并且其状态更新不会运行.
通常,这并不一定意味着该层以推理模式运行(通常由调用层时可以传递的训练参数来控制).冻结状态"指的是冻结状态".和推断模式"是两个不同的概念.
但是,对于BatchNormalization层,在该层上设置trainable = False意味着该层随后将以推理模式运行(这意味着它将使用移动平均值和移动方差对当前批次进行归一化,而不是使用当前批次的均值和方差).
TensorFlow 2.0中已引入此行为,以使layer.trainable = False可以在convnet微调用例中产生最普遍预期的行为.

我对概念中的冻结状态"和推理模式"一词不太了解.我尝试通过将 trainable 设置为False进行微调,发现移动平均值和移动方差未得到更新.

所以我有以下问题:

  1. 训练和可训练两个属性有什么区别?
  2. 如果将可训练的值设置为false,则在训练过程中是否会更新gamma和beta?
  3. 为什么在微调时有必要将可训练性设置为false?

解决方案

2个训练属性和可训练属性有什么区别?

可训练的:-(如果为True),基本上意味着可训练的"参数的权重将在反向传播中更新.

训练:-:某些层在训练和推理(或测试)步骤中的表现有所不同.一些示例包括Dropout层,Batch-Normalization层.因此,此属性告诉图层应以哪种方式执行.

如果将可训练值设置为false,则在训练过程中是否会更新gamma和beta?

由于γ和β是可训练的",因此BN层的参数,如果将可训练设置为假",则不会在训练过程中对其进行更新.

为什么在微调时有必要将可训练性设置为false?

在进行微调时,我们首先在顶部添加了我们自己的分类FC层,该层被随机初始化,但是我们的预训练"层却被随机初始化了.模型已经针对该任务进行了校准(有点).

打个比方,这样想.

您有一条介于0到10之间的数字线.在此数字线上,"0"代表一个完全随机的模型,而"10"代表一种理想的模型.我们预先训练的模型是大约5或6或7左右,即最有可能比随机模型更好.我们在顶部添加的FC层在开始时是随机的,因此位于'0'.

我们为预训练模型设置了可训练= False,以便我们可以使FC层迅速(即以更高的学习率)达到预训练模型的水平.如果我们没有为预先训练的模型设置trainable = False,并且使用较高的学习率,那么它将造成严重破坏.

因此,最初,我们为预训练模型设置了更高的学习率,并且可训练= False并训练了FC层.之后,我们解冻我们的预训练模型,并以非常低的学习率来实现我们的目标.

如果需要,可以自由地要求进一步澄清,如果发现有帮助则请赞成.

According to the official documents from tensorflow:

About setting layer.trainable = False on a `BatchNormalization layer:
The meaning of setting layer.trainable = False is to freeze the layer, i.e. its internal state will not change during training: its trainable weights will not be updated during fit() or train_on_batch(), and its state updates will not be run.
Usually, this does not necessarily mean that the layer is run in inference mode (which is normally controlled by the training argument that can be passed when calling a layer). "Frozen state" and "inference mode" are two separate concepts.
However, in the case of the BatchNormalization layer, setting trainable = False on the layer means that the layer will be subsequently run in inference mode (meaning that it will use the moving mean and the moving variance to normalize the current batch, rather than using the mean and variance of the current batch).
This behavior has been introduced in TensorFlow 2.0, in order to enable layer.trainable = False to produce the most commonly expected behavior in the convnet fine-tuning use case.

I don't quite understand the term 'frozen state' and 'inference mode' here in the concept. I tried fine-tuning by setting the trainable to False, and I found that the moving mean and moving variance are not being updated.

So I have the following questions:

  1. What's the difference between 2 attributes training and trainable?
  2. Is gamma and beta getting updated in the training process if set trainable to false?
  3. Why is it necessary to set trainable to false when fine-tuning?

解决方案

What's the difference between 2 attributes training and trainable?

trainable:- ( If True ) It basically implies that the "trainable" weights of the parameter( of the layer ) will be updated in backpropagation.

training:- Some layers perform differently at training and inference( or testing ) steps. Some examples include Dropout Layer, Batch-Normalization layers. So this attribute tells the layer that in what manner it should perform.

Is gamma and beta getting updated in the training process if set trainable to false?

Since gamma and beta are "trainable" parameters of the BN Layer, they will NOT be updated in the training process if set trainable is set to "False".

Why is it necessary to set trainable to false when fine-tuning?

When doing fine-tuning, we first add our own classification FC layer at the top which is randomly initialized but our "pre-trained" model is already calibrated( a bit ) for the task.

As an analogy, think like this.

You have a number line from 0 - 10. On this number line, '0' represents a completely randomized model whereas '10' represents a kind of perfect model. Our pre-trained model is somewhere around 5 or maybe 6 or maybe 7 i.e. most probably better than a random model. The FC Layer we have added at the top is at '0' as it is randomized at the start.

We set trainable = False for the pre-trained model so that we can make the FC Layer reach the level of the pre-trained model rapidly i.e. with a higher learning rate. If we don't set trainable = False for the pre-trained model and use a higher learning rate then it will wreak havoc.

So initially, we set a higher learning rate and trainable = False for the pre-trained model and train the FC layer. After that, we unfreeze our pre-trained model and use a very low learning rate to serve our purpose.

Do freely ask for more clarification if required and upvote if you find it helpful.

这篇关于Keras Tensorfolow的BatchNormalization层中的attrainbuts“可训练"和"training"之间有什么区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆