出于什么原因,在深度神经网络中使用了卷积1x1? [英] For what reason Convolution 1x1 is used in deep neural networks?

查看:139
本文介绍了出于什么原因,在深度神经网络中使用了卷积1x1?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在查看InceptionV3(GoogLeNet)架构,但无法理解为什么我们需要conv1x1层?

I'm looking at InceptionV3 (GoogLeNet) architecture and cannot understand why do we need conv1x1 layers?

我知道卷积的工作原理,但是我发现补丁大小> 1可以获利.

I know how convolution works, but I see a profit with patch size > 1.

推荐答案

1x1xD卷积放置在网络中的某个位置时,您可以考虑将其作为降维技术.

You can think about 1x1xD convolution as a dimensionality reduction technique when it's placed somewhere into a network.

如果输入体积为100x100x512,并且将其与一组D过滤器进行卷积,则每个过滤器的尺寸为1x1x512,则可以将要素数量从512减少到D. 因此,输出音量为100x100xD.

If you have an input volume of 100x100x512 and you convolve it with a set of D filters each one with size 1x1x512 you reduce the number of features from 512 to D. The output volume is, therefore, 100x100xD.

如您所见,该(1x1x512)xD卷积在数学上等效于完全连接的层.主要区别在于,尽管FC层要求输入具有固定大小,但卷积层可以接受输入中空间范围大于或等于100x100的每个体积.

As you can see this (1x1x512)xD convolution is mathematically equivalent to a fully connected layer. The main difference is that whilst FC layer requires the input to have a fixed size, the convolutional layer can accept in input every volume with spatial extent greater or equal than 100x100.

由于这种等效性,1x1xD卷积可以替代任何完全连接的层.

A 1x1xD convolution can substitute any fully connected layer because of this equivalence.

此外,1x1xD卷积不仅减少了输入到下一层的功能,而且还将新的参数和新的非线性引入网络,这将有助于提高模型的准确性.

In addition, 1x1xD convolutions not only reduce the features in input to the next layer, but also introduces new parameters and new non-linearity into the network that will help to increase model accuracy.

1x1xD卷积放置在分类网络的末尾时,它恰好充当FC层,但是与其将其视为降维技术,不如将其视为降维技术更直观.输出形状为WxHxnum_classes的张量.

When the 1x1xD convolution is placed at the end of a classification network, it acts exactly as a FC layer, but instead of thinking about it as a dimensionality reduction technique it's more intuitive to think about it as a layer that will output a tensor with shape WxHxnum_classes.

输出张量的空间范围(由WH标识)是动态的,并且由网络分析的输入图像的位置确定.

The spatial extent of the output tensor (identified by W and H) is dynamic and is determined by the locations of the input image that the network analyzed.

如果使用输入200x200x3定义了网络,并且在输入中给了该图像此大小,则输出将是具有W = H = 1且深度= num_classes的地图. 但是,如果输入图像的空间范围大于200x200,则卷积网络将分析输入图像的不同位置(就像标准卷积一样),并生成带有W > 1H > 1的张量. FC层无法约束网络来接受固定大小的输入并产生固定大小的输出.

If the network has been defined with an input of 200x200x3 and we give it in input an image with this size, the output will be a map with W = H = 1 and depth = num_classes. But, if the input image have a spatial extent greater than 200x200 than the convolutional network will analyze different locations of the input image (just like a standard convolution does) and will produce a tensor with W > 1 and H > 1. This is not possibile with a FC layer that constrains the network to accept fixed size input and produce fixed size output.

这篇关于出于什么原因,在深度神经网络中使用了卷积1x1?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆