批量标准化,是还是否? [英] batch normalization, yes or no?

查看:163
本文介绍了批量标准化,是还是否?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用Tensorflow 1.14.0和Keras 2.2.4.以下代码实现了一个简单的神经网络:

I use Tensorflow 1.14.0 and Keras 2.2.4. The following code implements a simple neural network:

import numpy as np
np.random.seed(1)
import random
random.seed(2)
import tensorflow as tf
tf.set_random_seed(3)

from tensorflow.keras.models import Model, Sequential
from tensorflow.keras.layers import Input, Dense, Activation


x_train=np.random.normal(0,1,(100,12))

model = Sequential()
model.add(Dense(8, input_shape=(12,)))
# model.add(tf.keras.layers.BatchNormalization())
model.add(Activation('linear'))
model.add(Dense(12))
model.add(Activation('linear'))
model.compile(optimizer='adam', loss='mean_squared_error')
model.fit(x_train, x_train,epochs=20, validation_split=0.1, shuffle=False,verbose=2)

20个纪元后的最终val_loss为0.7751.当我取消注释添加批处理规范化层的唯一注释行时,val_loss更改为1.1230.

The final val_loss after 20 epochs is 0.7751. When I uncomment the only comment line to add the batch normalization layer, the val_loss changes to 1.1230.

我的主要问题是方法更复杂,但同样的事情也会发生.由于激活是线性的,因此在激活之后还是之前将批处理归一化都没关系.

My main problem is way more complicated, but the same thing occurs. Since my activation is linear, it does not matter if I put the batch normalization after or before the activation.

问题:为什么批量规范化无济于事?有什么我可以更改的,以便批量标准化可以在不更改激活功能的情况下改善结果吗?

Questions: Why batch normalization cannot help? Is there anything I can change so that the batch normalization improves the result without changing the activation functions?

收到评论后更新:

具有一个隐藏层和线性激活的NN类似于PCA.有很多关于此的论文.对我来说,此设置在隐藏层和输出的所有激活功能组合中提供的MSE最少.

An NN with one hidden layer and linear activations is kind of like PCA. There are tons of papers on this. For me, this setting gives minimal MSE among all combinations of activation functions for the hidden layer and output.

某些表明线性激活的资源表示PCA:

Some resources that state linear activations mean PCA:

https://arxiv.org/pdf/1702.07800.pdf

https://link.springer.com/article/10.1007/BF00275687

https ://www.quora.com/How-can-I-make-a-neural-network-to-work-as-a-PCA

推荐答案

是.

您观察到的行为是一个错误-您不需要BN即可看到它;左边是#V1的图,右边是#V2的图:

The behavior you're observing is a bug - and you don't need BN to see it; plot to the left is for #V1, to the right is for #V2:

#V1
model = Sequential()
model.add(Dense(8, input_shape=(12,)))
#model.add(Activation('linear')) <-- uncomment == #V2
model.add(Dense(12))
model.compile(optimizer='adam', loss='mean_squared_error')

显然是荒谬的,因为在具有activation=None(== 'linear')的层之后的Activation('linear')身份:model.layers[1].output.name == 'activation/activation/Identity:0'.可以通过获取和绘制中间层输出来进一步确认这一点,对于'dense''activation'相同,此处将省略.

Clearly nonsensical, as Activation('linear') after a layer with activation=None (=='linear') is an identity: model.layers[1].output.name == 'activation/activation/Identity:0'. This can be confirmed further by fetching and plotting intermediate layer outputs, which are identical for 'dense' and 'activation' - will omit here.

因此,激活实际上不执行任何操作,只是不执行任何操作-在1.14.0和2.0.0之间的提交链中的某个位置,此操作已修复,尽管我不知道在哪里.使用TF 2.0.0和Keras 2.3.1的TF结合BN的结果:

So, the activation does literally nothing, except it doesn't - somewhere along the commit chain between 1.14.0 and 2.0.0, this was fixed, though I don't know where. Results w/ BN using TF 2.0.0 w/ Keras 2.3.1 below:

val_loss = 0.840 # without BN
val_loss = 0.819 # with BN

解决方案:更新至TensorFlow 2.0.0,Keras 2.3.1.

Solution: update to TensorFlow 2.0.0, Keras 2.3.1.

提示:使用带有虚拟环境的 Anaconda .如果您还没有任何虚拟环境,请运行:

Tip: use Anaconda w/ virtual environment. If you don't have any virtual envs yet, run:

conda create --name tf2_env --clone base
conda activate tf2_env
conda uninstall tensorflow-gpu
conda uninstall keras
conda install -c anaconda tensorflow-gpu==2.0.0
conda install -c conda-forge keras==2.3.1

可能会比这更多地参与其中,但这是另一个问题.

May be a bit more involved than this, but that's subject of another question.

更新:从keras而不是tf.keras导入也可以解决此问题.

UPDATE: importing from keras instead of tf.keras also solves the problem.

免责声明:BN在Keras中仍然是一个有争议的"层,但尚未完全解决-请参见

Disclaimer: BN remains a 'controversial' layer in Keras, yet to be fully fixed - see Relevant Git; I plan on investigating it myself eventually, but for your purposes, this answer's fix should suffice.

我还建议您熟悉BN的基本理论,尤其是有关其火车与推理操作的理论;简而言之,批号小于32是一个非常糟糕的主意,数据集应足够大以使BN能够准确地逼近测试集gammabeta.

I also recommend familiarizing yourself with BN's underlying theory, in particular regarding its train vs. inference operation; in a nutshell, batch sizes under 32 is a pretty bad idea, and dataset should be sufficiently large to allow BN to accurately approximate test-set gamma and beta.

使用的代码:

x_train=np.random.normal(0, 1, (100, 12))

model = Sequential()
model.add(Dense(8, input_shape=(12,)))
#model.add(Activation('linear'))
#model.add(tf.keras.layers.BatchNormalization())
model.add(Dense(12))
model.compile(optimizer='adam', loss='mean_squared_error')

W_sum_all = []  # fit rewritten to allow runtime weight collection
for _ in range(20):
    for i in range(9):
        x = x_train[i*10:(i+1)*10]
        model.train_on_batch(x, x)

        W_sum_all.append([])
        for layer in model.layers:
            if layer.trainable_weights != []:
                W_sum_all[-1] += [np.sum(layer.get_weights()[0])]
model.evaluate(x[-10:], x[-10:])

plt.plot(W_sum_all)
plt.title("Sum of weights (#V1)", weight='bold', fontsize=14)
plt.legend(labels=["dense", "dense_1"], fontsize=14)
plt.gcf().set_size_inches(7, 4)

导入/预执行:

import numpy as np
np.random.seed(1)
import random
random.seed(2)
import tensorflow as tf
if tf.__version__[0] == '2':
    tf.random.set_seed(3)
else:
    tf.set_random_seed(3)

import matplotlib.pyplot as plt
from tensorflow.keras.models import Model, Sequential
from tensorflow.keras.layers import Input, Dense, Activation

这篇关于批量标准化,是还是否?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆