他们如何计算Caffe中这个convnet示例的输出量? [英] How did they calculate the output volume for this convnet example in Caffe?
问题描述
在此教程中,输出量在输出[25]中说明,接受区域在输出[26]中指定.
In this tutorial, the output volumes are stated in output [25], and the receptive fields are specified in output [26].
好吧,输入体积[3, 227, 227]
与大小为[3, 11, 11]
的区域卷积.
Okay, the input volume [3, 227, 227]
gets convolved with the region of size [3, 11, 11]
.
使用此公式 (W−F+2P)/S+1
,其中:
W
=输入音量大小
F
=接收域大小
P
=填充
S
=步幅
Using this formula (W−F+2P)/S+1
, where:
W
= the input volume size
F
= the receptive field size
P
= padding
S
= stride
...结果为(227 - 11)/4 + 1 = 55
,即 [55 * 55 * 96] .到目前为止一切都很好:)
...results with (227 - 11)/4 + 1 = 55
i.e. [55*55*96]. So far so good :)
对于'pool1',他们使用了F=3
和S=2
我认为吗?计算将检出:55-3/2+1=27
.
For 'pool1' they used F=3
and S=2
I think? The calculation checks out: 55-3/2+1=27
.
从这一点上我有点困惑.第二个convnet层的接收字段为[48, 5, 5]
,但是'conv2'的输出等于[256, 27, 27]
.这里发生了什么计算?
From this point I get a bit confused. The receptive field for the second convnet layer is [48, 5, 5]
, yet the output for 'conv2' is equal to [256, 27, 27]
. What calculation happened here?
然后,'conv3'到'conv4'的输出量的高度和宽度都相同[13, 13]
?这是怎么回事?
And then, the height and width of the output volumes of 'conv3' to 'conv4' are all the same [13, 13]
? What's going on?
谢谢!
推荐答案
If you look closely at the parameters of conv2
layer you'll notice
pad: 2
也就是说,输入Blob周围被额外的2个像素填充,因此公式为
That is, the input blob is padded by 2 extra pixels all around, thus the formula now is
27 + 2 + 2 - ( 5 - 1 ) = 27
在两侧都填充5
像素的5
内核大小会产生相同的输出大小.
Padding a kernel size of 5
with 2
pixels from both sides yields the same output size.
这篇关于他们如何计算Caffe中这个convnet示例的输出量?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!