keras BatchNormalization轴说明 [英] keras BatchNormalization axis clarification
问题描述
keras BatchNormalization
层使用axis=-1
作为默认值并指出该功能轴通常进行归一化.为什么会这样?
The keras BatchNormalization
layer uses axis=-1
as a default value and states that the feature axis is typically normalized. Why is this the case?
I suppose this is surprising because I'm more familiar with using something like StandardScaler
, which would be equivalent to using axis=0
. This would normalize the features individually.
是否有理由默认将样本(而非特征)分别在keras中单独归一化(即axis=-1
)?
Is there a reason why samples are individually normalized by default (i.e. axis=-1
) in keras as opposed to features?
具体示例
转换数据以使每个特征的均值和单位方差为零是很常见的.让我们考虑一下该模拟数据集的零均值"部分,其中每一行都是一个样本:
It's common to transform data such that each feature has zero mean and unit variance. Let's just consider the "zero mean" part with this mock dataset, where each row is a sample:
>>> data = np.array([[ 1, 10, 100, 1000],
[ 2, 20, 200, 2000],
[ 3, 30, 300, 3000]])
>>> data.mean(axis=0)
array([ 2., 20., 200., 2000.])
>>> data.mean(axis=1)
array([ 277.75, 555.5 , 833.25])
减去axis=0
均值而不是axis=1
均值是否更有意义?使用axis=1
,单位和比例可以完全不同.
Wouldn't it make more sense to subtract the axis=0
mean, as opposed to the axis=1
mean? Using axis=1
, the units and scales can be completely different.
本文中第3节的第一个等式似乎暗示axis=0
应该是假设您有一个(m,n)形状的数据集,其中m是样本数,n是特征数,则用于分别计算每个特征的期望和方差.
The first equation of section 3 in this paper seems to imply that axis=0
should be used for calculating expectations and variances for each feature individually, assuming you have an (m, n) shaped dataset where m is the number of samples and n is the number of features.
另一个示例
我想查看BatchNormalization
在玩具数据集上计算的均值和方差的维度:
I wanted to see the dimensions of the means and variances BatchNormalization
was calculating on a toy dataset:
import pandas as pd
import numpy as np
from sklearn.datasets import load_iris
from keras.optimizers import Adam
from keras.models import Model
from keras.layers import BatchNormalization, Dense, Input
iris = load_iris()
X = iris.data
y = pd.get_dummies(iris.target).values
input_ = Input(shape=(4, ))
norm = BatchNormalization()(input_)
l1 = Dense(4, activation='relu')(norm)
output = Dense(3, activation='sigmoid')(l1)
model = Model(input_, output)
model.compile(Adam(0.01), 'categorical_crossentropy')
model.fit(X, y, epochs=100, batch_size=32)
bn = model.layers[1]
bn.moving_mean # <tf.Variable 'batch_normalization_1/moving_mean:0' shape=(4,) dtype=float32_ref>
输入X的形状为(150,4),BatchNormalization
层的计算结果为4,表示它在axis=0
上进行操作.
The input X has shape (150, 4), and the BatchNormalization
layer calculated 4 means, which means it operated over axis=0
.
如果BatchNormalization
的默认值为axis=-1
,那么应该不应该有150个均值吗?
If BatchNormalization
has a default of axis=-1
then shouldn't there be 150 means?
推荐答案
造成混淆的原因是np.mean
与BatchNormalization
中axis
的含义.
The confusion is due to the meaning of axis
in np.mean
versus in BatchNormalization
.
当我们沿着一条轴取平均值时,我们会将该维度折叠起来并保留所有其他维度.在您的示例中,data.mean(axis=0)
折叠0-axis
,这是data
的垂直尺寸.
When we take the mean along an axis, we collapse that dimension and preserve all other dimensions. In your example data.mean(axis=0)
collapses the 0-axis
, which is the vertical dimension of data
.
当我们沿轴计算BatchNormalization
时,我们会保留数组的尺寸,并针对每条其他轴上的均值和标准差进行归一化.因此,在您的2D
示例BatchNormalization
中,axis=1
是减去了axis=0
的平均值,正如您所期望的那样.这就是bn.moving_mean
具有形状(4,)
的原因.
When we compute a BatchNormalization
along an axis, we preserve the dimensions of the array, and we normalize with respect to the mean and standard deviation over every other axis. So in your 2D
example BatchNormalization
with axis=1
is subtracting the mean for axis=0
, just as you expect. This is why bn.moving_mean
has shape (4,)
.
这篇关于keras BatchNormalization轴说明的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!