神经网络中的批量归一化 [英] batch normalization in neural network

查看:151
本文介绍了神经网络中的批量归一化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对ANN还是很陌生,我只是在阅读批处理规范化论文( http://arxiv.org/pdf/1502.03167.pdf ),但我不确定我是否知道他们在做什么(更重要的是,为什么它起作用)

I'm still fairly new with ANN and I was just reading the Batch Normalization paper (http://arxiv.org/pdf/1502.03167.pdf), but I'm not sure I'm getting what they are doing (and more importantly, why it works)

所以说我有两层L1和L2,其中L1产生输出并将它们发送到L2中的神经元.批量归一化仅取L1的所有输出(即,每个单个神经元的每个输出,对于完全连接的网络而言,其总矢量为|L1| X |L2|),将其归一化为均值0,SD为1,然后将它们喂给L2中各自的神经元(加上应用了他们在本文中讨论的gamma和beta的线性变换)?

So let's say I have two layers L1 and L2, where L1 produces outputs and sends them to the neurons in L2. Batch normalization just takes all the outputs from L1 (i.e. every single output from every single neuron, getting an overall vector of |L1| X |L2| numbers for a fully connected network), normalizes them to have a mean of 0 and SD of 1, and then feeds them to their respective neurons in L2 (plus applying the linear transformation of gamma and beta they were discussing in the paper)?

如果确实如此,这将如何帮助NN?恒定分布有什么特别之处?

If this is indeed the case, how is this helping the NN? what's so special about a constant distribution?

推荐答案

在对网络进行标准SGD训练期间,由于隐藏层也不断变化,因此输入到隐藏层的分布也将发生变化.这被称为协变量平移,可能是一个问题.例如,请参见此处.

During standard SGD training of a network, the distribution of inputs to a hidden layer will change because the hidden layer before it is constantly changing as well. This is known as covariate shift and can be a problem; see, for instance, here.

众所周知,如果训练数据被白化"(即以一种使得每个分量具有高斯分布且独立于其他分量的方式进行变换)的神经网络,收敛速度会更快.参见论文中引用的论文(LeCun等,1998b)和论文(Wiesler& Ney,2011).

It is known that neural networks converge faster if the training data is "whitened", that is, transformed in such a way that each component has a Gaussian distribution and is independent of the other components. See the papers (LeCun et al., 1998b) and (Wiesler & Ney, 2011) cited in the paper.

作者现在的想法是不仅将这种增白应用于输入层,而且还将其应用于每个中间层的输入.在整个输入数据集上执行此操作将太昂贵,因此,它们要分批执行.他们声称这可以极大地加快培训过程,也可以作为一种正规化.

The idea of the authors is now to apply this whitening not only to the input layer, but to the input of every intermediate layer as well. It would be too expensive to do this over the entire input dataset, so instead they do it batch-wise. They claim that this can vastly speed up the training process and also acts as a sort of regularization.

这篇关于神经网络中的批量归一化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆