卷积神经网络中的批量归一化 [英] Batch Normalization in Convolutional Neural Network

本文介绍了卷积神经网络中的批量归一化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是卷积神经网络的新手,只是对特征图以及如何在图像上进行卷积以提取特征有想法.我很高兴知道在CNN中应用批处理规范化的一些细节.

I am newbie in convolutional neural networks and just have idea about feature maps and how convolution is done on images to extract features. I would be glad to know some details on applying batch normalisation in CNN.

我阅读了这篇论文 https://arxiv.org/pdf/1502.03167v3.pdf并且可以理解应用于数据的BN算法,但最后他们提到将其应用于CNN时需要进行一些修改:

I read this paper https://arxiv.org/pdf/1502.03167v3.pdf and could understand the BN algorithm applied on a data but in the end they mentioned that a slight modification is required when applied to CNN:

对于卷积层,我们还希望归一化服从卷积属性-以便以相同的方式对同一要素图的不同元素在不同位置进行归一化.为了实现这一目标,我们在所有位置上以小批量的方式联合标准化了所有激活.在Alg.在图1中,我们将B作为特征图上所有小批量和空间位置的所有值的集合–因此,对于大小为m的小批量和大小为p×q的特征图,我们使用effec -大小为m'= | B |的小批量生产= m·pq.我们每个特征图而不是每个激活都学习一对参数γ(k)和β(k).海藻对2进行类似的修改,以便在推理过程中BN变换将相同的线性变换应用于给定特征图中的每个激活.

For convolutional layers, we additionally want the normalization to obey the convolutional property – so that different elements of the same feature map, at different locations, are normalized in the same way. To achieve this, we jointly normalize all the activations in a mini- batch, over all locations. In Alg. 1, we let B be the set of all values in a feature map across both the elements of a mini-batch and spatial locations – so for a mini-batch of size m and feature maps of size p × q, we use the effec- tive mini-batch of size m′ = |B| = m · pq. We learn a pair of parameters γ(k) and β(k) per feature map, rather than per activation. Alg. 2 is modified similarly, so that during inference the BN transform applies the same linear transformation to each activation in a given feature map.

他们说的时候我很困惑 以便同一要素图的不同元素在不同位置以相同的方式归一化"

I am total confused when they say "so that different elements of the same feature map, at different locations, are normalized in the same way"

我知道要素地图的含义,而每个要素地图中的权重都是不同的元素.但是我不明白什么是位置或空间位置.

I know what feature maps mean and different elements are the weights in every feature map. But I could not understand what location or spatial location means.

我根本听不懂下面的句子 在算法1中,我们将B设为要素图上所有小批量元素和空间位置的所有值的集合"

I could not understand the below sentence at all "In Alg. 1, we let B be the set of all values in a feature map across both the elements of a mini-batch and spatial locations"

如果有人冷漠地用更简单的方式向我解释并解释我,我会很高兴

I would be glad if someone cold elaborate and explain me in much simpler terms

推荐答案

让我们从这些术语开始.请记住,卷积层的输出是4阶张量[B, H, W, C],其中B是批处理大小,(H, W)特征图大小,C是渠道.索引(x, y),其中0 <= x < H0 <= y < W空间位置.

Let's start with the terms. Remember that the output of the convolutional layer is a 4-rank tensor [B, H, W, C], where B is the batch size, (H, W) is the feature map size, C is the number of channels. An index (x, y) where 0 <= x < H and 0 <= y < W is a spatial location.

现在,这是以通常的方式(以伪代码)应用batchnorm的方法:

Now, here's how the batchnorm is applied in a usual way (in pseudo-code):

# t is the incoming tensor of shape [B, H, W, C]
# mean and stddev are computed along 0 axis and have shape [H, W, C]
mean = mean(t, axis=0)
stddev = stddev(t, axis=0)
for i in 0..B-1:
  out[i,:,:,:] = norm(t[i,:,:,:], mean, stddev)

基本上,它将计算B个元素上的H*W*C均值和H*W*C标准差.您可能会注意到,位于不同空间位置的不同元素具有各自的均值和方差,并且仅收集B值.

Basically, it computes H*W*C means and H*W*C standard deviations across B elements. You may notice that different elements at different spatial locations have their own mean and variance and gather only B values.

这种方式是完全可能的.但是卷积层具有特殊的属性:过滤器权重在输入图像中共享(您可以在此信息).这就是为什么以相同的方式对输出进行归一化是合理的,以便每个输出值在不同位置采用B*H*W值的均值和方差.

This way is totally possible. But the convolutional layer has a special property: filter weights are shared across the input image (you can read it in detail in this post). That's why it's reasonable to normalize the output in the same way, so that each output value takes the mean and variance of B*H*W values, at different locations.

在这种情况下,代码是这样的(再次是伪代码):

Here's how the code looks like in this case (again pseudo-code):

# t is still the incoming tensor of shape [B, H, W, C]
# but mean and stddev are computed along (0, 1, 2) axes and have just [C] shape
mean = mean(t, axis=(0, 1, 2))
stddev = stddev(t, axis=(0, 1, 2))
for i in 0..B-1, x in 0..H-1, y in 0..W-1:
  out[i,x,y,:] = norm(t[i,x,y,:], mean, stddev)

总共只有C个均值和标准差,并且每一个均是根据B*H*W值计算的.这就是他们说有效的小批量"的意思:两者之间的区别仅在于轴选择(或等效的小批量选择").

In total, there are only C means and standard deviations and each one of them is computed over B*H*W values. That's what they mean when they say "effective mini-batch": the difference between the two is only in axis selection (or equivalently "mini-batch selection").

这篇关于卷积神经网络中的批量归一化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆