如何设置批量归一化层的权重? [英] How to set weights of the batch normalization layer?

查看:298
本文介绍了如何设置批量归一化层的权重?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何设置Keras批处理规范化层的权重?

How do I set weights of the batch normalization layer of Keras?

文档让我有些困惑

weights:初始化权重。 2个Numpy数组的列表,其形状为:[[input_shape,),(input_shape,)]注意,此列表的顺序为[gamma,beta,均值,std]

weights: Initialization weights. List of 2 Numpy arrays, with shapes: [(input_shape,), (input_shape,)] Note that the order of this list is [gamma, beta, mean, std]

我们是否需要全部四个[γ,β,均值,std]?有没有办法只使用[gamma,beta]来设置权重?

Do we need all four [gamma, beta, mean, std]? Is there a way to set weights using only [gamma, beta]?

推荐答案

是的,您需要全部四个值。回忆一下批处理规范化的工作。其目标是对进入每一层的输入进行归一化(即均值= 0和标准差= 1)。为此,您需要(平均值,标准)。因此,可以将规范化激活视为进行线性转换的子网的输入:

Yes, you need all four values. Recollect what batch normalization does. Its goal is to normalize (i.e. mean = 0 and standard deviation = 1) inputs coming into each layer. To this end, you need (mean, std). Thus a normalized activation can be viewed as an input to a sub-network which does a linear transformation:

y = gamma*x_norm + beta

(伽玛,beta)非常重要,因为它们补充了(平均值,std),因为(gamma,beta)有助于获得原始激活从标准化的返回。如果您不这样做或在不考虑其他任何一个参数的情况下更改了任何一个参数,则可能会更改激活的语义。这些原始激活现在可以与您的下一层一起处理。在所有图层上都重复此过程。

(gamma, beta) are very important since they complement (mean,std) in the sense that (gamma, beta) help get the original activations back from the normalized ones. If you don't do this or change any one parameter without considering the others, you risk changing the semantic meaning of the activations. These original activations can now be processed with your next layer. This process is repeated for all layers.

编辑:

手,我认为首先尝试在大量图像上计算均值和标准差,然后将其作为您的均值和标准差是值得的。请注意,您正在计算的图像的平均值和标准图像与训练数据来自同一分布。我认为这应该起作用,因为批处理规范化通常具有两种计算均值的模式,一种是在批处理中保持运行平均值,而另一种是全局均值(至少在Caffe中,请参见此处)。

On the other hand, I think it would be worth trying to first compute the mean and std on a large number of images and take input that as your mean and std. Take care that the images that you are computing mean and std on, come from the same distribution as your training data. I think this should work as batch normalization usually has two modes for computing mean, one is running average maintained over batches and the other is global mean (at least in Caffe, see here).

这篇关于如何设置批量归一化层的权重?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆