keras BatchNormalization轴说明 [英] keras BatchNormalization axis clarification

查看:1498
本文介绍了keras BatchNormalization轴说明的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

keras BatchNormalization使用axis=-1作为默认值并指出该功能轴通常进行归一化.为什么会这样?

The keras BatchNormalization layer uses axis=-1 as a default value and states that the feature axis is typically normalized. Why is this the case?

我认为这是令人惊讶的,因为我更熟悉使用

I suppose this is surprising because I'm more familiar with using something like StandardScaler, which would be equivalent to using axis=0. This would normalize the features individually.

是否有理由默认将样本(而非特征)分别在keras中单独归一化(即axis=-1)?

Is there a reason why samples are individually normalized by default (i.e. axis=-1) in keras as opposed to features?

具体示例

转换数据以使每个特征的均值和单位方差为零是很常见的.让我们考虑一下该模拟数据集的零均值"部分,其中每一行都是一个样本:

It's common to transform data such that each feature has zero mean and unit variance. Let's just consider the "zero mean" part with this mock dataset, where each row is a sample:

>>> data = np.array([[   1,   10,  100, 1000],
                     [   2,   20,  200, 2000],
                     [   3,   30,  300, 3000]])

>>> data.mean(axis=0)
array([    2.,    20.,   200.,  2000.])

>>> data.mean(axis=1)
array([ 277.75,  555.5 ,  833.25])

减去axis=0均值而不是axis=1均值是否更有意义?使用axis=1,单位和比例可以完全不同.

Wouldn't it make more sense to subtract the axis=0 mean, as opposed to the axis=1 mean? Using axis=1, the units and scales can be completely different.

本文中第3节的第一个等式似乎暗示axis=0应该是假设您有一个(m,n)形状的数据集,其中m是样本数,n是特征数,则用于分别计算每个特征的期望和方差.

The first equation of section 3 in this paper seems to imply that axis=0 should be used for calculating expectations and variances for each feature individually, assuming you have an (m, n) shaped dataset where m is the number of samples and n is the number of features.

另一个示例

我想查看BatchNormalization在玩具数据集上计算的均值和方差的维度:

I wanted to see the dimensions of the means and variances BatchNormalization was calculating on a toy dataset:

import pandas as pd
import numpy as np
from sklearn.datasets import load_iris

from keras.optimizers import Adam
from keras.models import Model
from keras.layers import BatchNormalization, Dense, Input


iris = load_iris()
X = iris.data
y = pd.get_dummies(iris.target).values

input_ = Input(shape=(4, ))
norm = BatchNormalization()(input_)
l1 = Dense(4, activation='relu')(norm)
output = Dense(3, activation='sigmoid')(l1)

model = Model(input_, output)
model.compile(Adam(0.01), 'categorical_crossentropy')
model.fit(X, y, epochs=100, batch_size=32)

bn = model.layers[1]
bn.moving_mean  # <tf.Variable 'batch_normalization_1/moving_mean:0' shape=(4,) dtype=float32_ref>

输入X的形状为(150,4),BatchNormalization层的计算结果为4,表示它在axis=0上进行操作.

The input X has shape (150, 4), and the BatchNormalization layer calculated 4 means, which means it operated over axis=0.

如果BatchNormalization的默认值为axis=-1,那么应该不应该有150个均值吗?

If BatchNormalization has a default of axis=-1 then shouldn't there be 150 means?

推荐答案

造成混淆的原因是np.meanBatchNormalizationaxis的含义.

The confusion is due to the meaning of axis in np.mean versus in BatchNormalization.

当我们沿着一条轴取平均值时,我们会将该维度折叠起来并保留所有其他维度.在您的示例中,data.mean(axis=0)折叠0-axis,这是data的垂直尺寸.

When we take the mean along an axis, we collapse that dimension and preserve all other dimensions. In your example data.mean(axis=0) collapses the 0-axis, which is the vertical dimension of data.

当我们沿轴计算BatchNormalization时,我们会保留数组的尺寸,并针对每条其他轴上的均值和标准差进行归一化.因此,在您的2D示例BatchNormalization中,axis=1 减去了axis=0的平均值,正如您所期望的那样.这就是bn.moving_mean具有形状(4,)的原因.

When we compute a BatchNormalization along an axis, we preserve the dimensions of the array, and we normalize with respect to the mean and standard deviation over every other axis. So in your 2D example BatchNormalization with axis=1 is subtracting the mean for axis=0, just as you expect. This is why bn.moving_mean has shape (4,).

这篇关于keras BatchNormalization轴说明的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆