keras BatchNormalization 轴说明 [英] keras BatchNormalization axis clarification

查看:45
本文介绍了keras BatchNormalization 轴说明的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

keras BatchNormalization 使用 axis=-1 作为默认值并说明特征轴通常是标准化的.为什么会这样?

我想这很令人惊讶,因为我更熟悉使用诸如 StandardScaler,这相当于使用 axis=0.这将单独标准化特征.

在 keras 中,样本在默认情况下(即 axis=-1)被单独归一化而不是特征有什么原因吗?

具体示例

转换数据使得每个特征的均值和单位方差为零是很常见的.让我们只考虑这个模拟数据集的零均值"部分,其中每一行都是一个样本:

<预><代码>>>>数据 = np.array([[ 1, 10, 100, 1000],[ 2, 20, 200, 2000],[ 3, 30, 300, 3000]])>>>数据.mean(轴= 0)数组([ 2., 20., 200., 2000.])>>>数据.平均值(轴= 1)数组([277.75, 555.5, 833.25])

减去 axis=0 的平均值而不是 axis=1 的平均值不是更有意义吗?使用axis=1,单位和刻度可以完全不同.

编辑 2:

这篇论文中第 3 节的第一个等式似乎暗示 axis=0 应该用于单独计算每个特征的期望和方差,假设您有一个 (m, n) 形状的数据集,其中 m 是样本数,n 是特征数.

编辑 3:另一个例子

我想查看 BatchNormalization 在玩具数据集上计算的均值和方差的维度:

将pandas导入为pd将 numpy 导入为 np从 sklearn.datasets 导入 load_iris从 keras.optimizers 导入 Adam从 keras.models 导入模型from keras.layers import BatchNormalization, Dense, Input虹膜 = load_iris()X = 虹膜数据y = pd.get_dummies(iris.target).values输入_ = 输入(形状=(4, ))norm = BatchNormalization()(input_)l1 = Dense(4, activation='relu')(norm)output = Dense(3, activation='sigmoid')(l1)模型 = 模型(输入_,输出)模型编译(亚当(0.01),'categorical_crossentropy')model.fit(X, y, epochs=100, batch_size=32)bn = 模型.层[1]bn.movi​​ng_mean # <tf.Variable 'batch_normalization_1/moving_mean:0' shape=(4,) dtype=float32_ref>

输入 X 的形状为 (150, 4),BatchNormalization 层计算了 4 的意思,这意味着它在 axis=0 上操作.

如果 BatchNormalization 的默认值是 axis=-1 那么不应该有 150 个均值吗?

解决方案

混淆是由于 np.meanBatchNormalization 中 axis 的含义.

当我们沿轴取平均值时,我们会折叠该维度并保留所有其他维度.在您的示例中 data.mean(axis=0) 折叠 0 轴,这是 data 的垂直维度.

当我们沿轴计算 BatchNormalization 时,我们保留了数组的维度,并根据 每个其他轴上的均值和标准差进行归一化.因此,在您的 2D 示例中 BatchNormalization with axis=1 is 减去 axis=0<的平均值/code>,正如你所期望的.这就是为什么 bn.movi​​ng_mean 有形状 (4,).

The keras BatchNormalization layer uses axis=-1 as a default value and states that the feature axis is typically normalized. Why is this the case?

I suppose this is surprising because I'm more familiar with using something like StandardScaler, which would be equivalent to using axis=0. This would normalize the features individually.

Is there a reason why samples are individually normalized by default (i.e. axis=-1) in keras as opposed to features?

Edit: example for concreteness

It's common to transform data such that each feature has zero mean and unit variance. Let's just consider the "zero mean" part with this mock dataset, where each row is a sample:

>>> data = np.array([[   1,   10,  100, 1000],
                     [   2,   20,  200, 2000],
                     [   3,   30,  300, 3000]])

>>> data.mean(axis=0)
array([    2.,    20.,   200.,  2000.])

>>> data.mean(axis=1)
array([ 277.75,  555.5 ,  833.25])

Wouldn't it make more sense to subtract the axis=0 mean, as opposed to the axis=1 mean? Using axis=1, the units and scales can be completely different.

Edit 2:

The first equation of section 3 in this paper seems to imply that axis=0 should be used for calculating expectations and variances for each feature individually, assuming you have an (m, n) shaped dataset where m is the number of samples and n is the number of features.

Edit 3: another example

I wanted to see the dimensions of the means and variances BatchNormalization was calculating on a toy dataset:

import pandas as pd
import numpy as np
from sklearn.datasets import load_iris

from keras.optimizers import Adam
from keras.models import Model
from keras.layers import BatchNormalization, Dense, Input


iris = load_iris()
X = iris.data
y = pd.get_dummies(iris.target).values

input_ = Input(shape=(4, ))
norm = BatchNormalization()(input_)
l1 = Dense(4, activation='relu')(norm)
output = Dense(3, activation='sigmoid')(l1)

model = Model(input_, output)
model.compile(Adam(0.01), 'categorical_crossentropy')
model.fit(X, y, epochs=100, batch_size=32)

bn = model.layers[1]
bn.moving_mean  # <tf.Variable 'batch_normalization_1/moving_mean:0' shape=(4,) dtype=float32_ref>

The input X has shape (150, 4), and the BatchNormalization layer calculated 4 means, which means it operated over axis=0.

If BatchNormalization has a default of axis=-1 then shouldn't there be 150 means?

解决方案

The confusion is due to the meaning of axis in np.mean versus in BatchNormalization.

When we take the mean along an axis, we collapse that dimension and preserve all other dimensions. In your example data.mean(axis=0) collapses the 0-axis, which is the vertical dimension of data.

When we compute a BatchNormalization along an axis, we preserve the dimensions of the array, and we normalize with respect to the mean and standard deviation over every other axis. So in your 2D example BatchNormalization with axis=1 is subtracting the mean for axis=0, just as you expect. This is why bn.moving_mean has shape (4,).

这篇关于keras BatchNormalization 轴说明的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆