出于什么原因,在深度神经网络中使用了卷积1x1? [英] For what reason Convolution 1x1 is used in deep neural networks?
问题描述
我正在查看InceptionV3(GoogLeNet)架构,但无法理解为什么我们需要conv1x1层?
I'm looking at InceptionV3 (GoogLeNet) architecture and cannot understand why do we need conv1x1 layers?
我知道卷积的工作原理,但是我发现补丁大小> 1可以获利.
I know how convolution works, but I see a profit with patch size > 1.
推荐答案
当1x1xD
卷积放置在网络中的某个位置时,您可以考虑将其作为降维技术.
You can think about 1x1xD
convolution as a dimensionality reduction technique when it's placed somewhere into a network.
如果输入体积为100x100x512
,并且将其与一组D
过滤器进行卷积,则每个过滤器的尺寸为1x1x512
,则可以将要素数量从512减少到D.
因此,输出音量为100x100xD
.
If you have an input volume of 100x100x512
and you convolve it with a set of D
filters each one with size 1x1x512
you reduce the number of features from 512 to D.
The output volume is, therefore, 100x100xD
.
如您所见,该(1x1x512)xD
卷积在数学上等效于完全连接的层.主要区别在于,尽管FC层要求输入具有固定大小,但卷积层可以接受输入中空间范围大于或等于100x100
的每个体积.
As you can see this (1x1x512)xD
convolution is mathematically equivalent to a fully connected layer. The main difference is that whilst FC layer requires the input to have a fixed size, the convolutional layer can accept in input every volume with spatial extent greater or equal than 100x100
.
由于这种等效性,1x1xD
卷积可以替代任何完全连接的层.
A 1x1xD
convolution can substitute any fully connected layer because of this equivalence.
此外,1x1xD
卷积不仅减少了输入到下一层的功能,而且还将新的参数和新的非线性引入网络,这将有助于提高模型的准确性.
In addition, 1x1xD
convolutions not only reduce the features in input to the next layer, but also introduces new parameters and new non-linearity into the network that will help to increase model accuracy.
将1x1xD
卷积放置在分类网络的末尾时,它恰好充当FC层,但是与其将其视为降维技术,不如将其视为降维技术更直观.输出形状为WxHxnum_classes
的张量.
When the 1x1xD
convolution is placed at the end of a classification network, it acts exactly as a FC layer, but instead of thinking about it as a dimensionality reduction technique it's more intuitive to think about it as a layer that will output a tensor with shape WxHxnum_classes
.
输出张量的空间范围(由W
和H
标识)是动态的,并且由网络分析的输入图像的位置确定.
The spatial extent of the output tensor (identified by W
and H
) is dynamic and is determined by the locations of the input image that the network analyzed.
如果使用输入200x200x3
定义了网络,并且在输入中给了该图像此大小,则输出将是具有W = H = 1
且深度= num_classes
的地图.
但是,如果输入图像的空间范围大于200x200
,则卷积网络将分析输入图像的不同位置(就像标准卷积一样),并生成带有W > 1
和H > 1
的张量.
FC层无法约束网络来接受固定大小的输入并产生固定大小的输出.
If the network has been defined with an input of 200x200x3
and we give it in input an image with this size, the output will be a map with W = H = 1
and depth = num_classes
.
But, if the input image have a spatial extent greater than 200x200
than the convolutional network will analyze different locations of the input image (just like a standard convolution does) and will produce a tensor with W > 1
and H > 1
.
This is not possibile with a FC layer that constrains the network to accept fixed size input and produce fixed size output.
这篇关于出于什么原因,在深度神经网络中使用了卷积1x1?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!