keras中的BatchNormalization如何工作? [英] How BatchNormalization in keras works?

查看:792
本文介绍了keras中的BatchNormalization如何工作?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道BatchNormalization在keras中如何工作,所以我编写代码:

I want to know how BatchNormalization works in keras, so I write the code:

X_input = keras.Input((2,))
X = keras.layers.BatchNormalization(axis=1)(X_input)
model1 = keras.Model(inputs=X_input, outputs=X)

输入是两个维向量的批处理,然后沿axis = 1对其进行归一化,然后输出输出:

the input is a batch of two dimenstions vector, and normalizing it along axis=1, then print the output:

a = np.arange(4).reshape((2,2))
print('a=')
print(a)
print('output=')
print(model1.predict(a,batch_size=2))

,输出为:

a=
array([[0, 1],
   [2, 3]])
output=
array([[ 0.        ,  0.99950039],
   [ 1.99900079,  2.9985013 ]], dtype=float32)

我无法弄清楚结果.据我所知,该批次的平均值应为([0,1] + [2,3])/2 = [1,2],var为1/2 *(([[0,1]- [1,2])^ 2 +([2,3]-[1,2])^ 2)= [1,1].最后,使用(x-平均值)/sqrt(var)对其进行归一化,因此结果为[-1,-1]和[1,1],我在哪里错了?

I can not figure out the results. As far as I know, the mean of the batch should be ([0,1] + [2,3])/2 = [1,2], the var is 1/2*(([0,1] - [1,2])^2 + ([2,3]-[1,2])^2) = [1,1]. Finally, normalizing it with (x - mean)/sqrt(var), therefore the results are [-1, -1] and [1,1], where am I wrong?

推荐答案

BatchNormalization 将减去均值,除以方差,应用因子gamma和偏移beta. 如果这些参数实际上是批次的均值和方差,则结果将以零为中心,且方差为1.

BatchNormalization will substract the mean, divide by the variance, apply a factor gamma and an offset beta. If these parameters would actually be the mean and variance of your batch, the result would be centered around zero with variance 1.

但事实并非如此. keras BatchNormalization层将这些存储为权重,被训练为 moving_mean moving_variance beta gamma .它们被初始化为 beta = 0 gamma = 1 moving_mean = 0 moving_variance = 1 .由于您没有任何训练步骤,因此BatchNorm不会更改您的值.

But they are not. The keras BatchNormalization layer stores these as weights that can be trained, called moving_mean, moving_variance, beta and gamma. They are initialized as beta=0, gamma=1, moving_mean=0 and moving_variance=1. Since you don't have any train steps, BatchNorm does not change your values.

那么,为什么不完全得到您的输入值?因为还有另一个参数 epsilon (一个很小的数字),它被添加到方差中.因此,所有值均除以 1 + epsilon 并最终低于其输入值.

So, why don't you get exactly your input values? Because there is another parameter epsilon (a small number), which gets added to the variance. Therefore, all values are divided by 1+epsilon and end up a little bit below their input values.

这篇关于keras中的BatchNormalization如何工作?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆