如何实现定长空间金字塔池化层? [英] How to implement the fixed length spatial pyramid pooling layer?
问题描述
我想实现本文中介绍的空间金字塔池化层.
I would like to implement the spatial pyramid pooling layer as introduced in this paper.
作为论文设置,重点是定义max_pooling层的variant kernel size和stride size,即:
As the paper setting, the keypoint is to define variant kernel size and stride size of max_pooling layer, which is:
kernel_size = ceil(a/n)
stride_size = floor(a/n)
其中 a
是输入张量空间大小,n
是金字塔级别,即池化输出的空间区间.
where a
is the input tensor spatial size, and n
is the pyramid level, i.e. spatial bins of the pooling output.
我尝试用 tensorflow 实现这一层:
I try to implement this layer with tensorflow:
import numpy as np
import tensorflow as tf
def spp_layer(input_, name='SPP_layer'):
"""
4 level SPP layer.
spatial bins: [6_6, 3_3, 2_2, 1_1]
Parameters
----------
input_ : tensor
name : str
Returns
-------
tensor
"""
shape = input_.get_shape().as_list()
with tf.variable_scope(name):
spp_6_6_pool = tf.nn.max_pool(input_,
ksize=[1,
np.ceil(shape[1]/6).astype(np.int32),
np.ceil(shape[2]/6).astype(np.int32),
1],
strides=[1, shape[1]//6, shape[2]//6, 1],
padding='SAME')
print('SPP layer level 6:', spp_6_6_pool.get_shape().as_list())
spp_3_3_pool = tf.nn.max_pool(input_,
ksize=[1,
np.ceil(shape[1]/3).astype(np.int32),
np.ceil(shape[2]/3).astype(np.int32),
1],
strides=[1, shape[1]//3, shape[2]//3, 1],
padding='SAME')
print('SPP layer level 3:', spp_3_3_pool.get_shape().as_list())
spp_2_2_pool = tf.nn.max_pool(input_,
ksize=[1,
np.ceil(shape[1]/2).astype(np.int32),
np.ceil(shape[2]/2).astype(np.int32),
1],
strides=[1, shape[1]//2, shape[2]//2, 1],
padding='SAME')
print('SPP layer level 2:', spp_2_2_pool.get_shape().as_list())
spp_1_1_pool = tf.nn.max_pool(input_,
ksize=[1,
np.ceil(shape[1]/1).astype(np.int32),
np.ceil(shape[2]/1).astype(np.int32),
1],
strides=[1, shape[1]//1, shape[2]//1, 1],
padding='SAME')
print('SPP layer level 1:', spp_1_1_pool.get_shape().as_list())
spp_6_6_pool_flat = tf.reshape(spp_6_6_pool, [shape[0], -1])
spp_3_3_pool_flat = tf.reshape(spp_3_3_pool, [shape[0], -1])
spp_2_2_pool_flat = tf.reshape(spp_2_2_pool, [shape[0], -1])
spp_1_1_pool_flat = tf.reshape(spp_1_1_pool, [shape[0], -1])
spp_pool = tf.concat(1, [spp_6_6_pool_flat,
spp_3_3_pool_flat,
spp_2_2_pool_flat,
spp_1_1_pool_flat])
return spp_pool
但是当输入大小不同时,它不能保证相同长度的池化输出.
But it cannot gurantee the same length pooling output, when the input sizes are different.
如何解决这个问题?
推荐答案
我认为论文作者有误,公式应该是:
I believe the authors of the paper are wrong, the formula should be:
stride_size = floor(a/n)
kernel_size = floor(a/n) + (a mod n)
请注意,对于 n < 两个公式都给出相同的结果.4.你可以通过用 n 对 a 进行欧几里德除法来证明这个结果.
Notice that both formula give the same result for n < 4. You can prove this result by doing the euclidian division of a by n.
我修改了在 https://github.com/tensorflow/tensorflow/找到的代码问题/6011,这里是:
I modified the code I found at https://github.com/tensorflow/tensorflow/issues/6011 and here it is:
def spp_layer(input_, levels=(6, 3, 2, 1), name='SPP_layer'):
shape = input_.get_shape().as_list()
with tf.variable_scope(name):
pyramid = []
for n in levels:
stride_1 = np.floor(float(shape[1] / n)).astype(np.int32)
stride_2 = np.floor(float(shape[2] / n)).astype(np.int32)
ksize_1 = stride_1 + (shape[1] % n)
ksize_2 = stride_2 + (shape[2] % n)
pool = tf.nn.max_pool(input_,
ksize=[1, ksize_1, ksize_2, 1],
strides=[1, stride_1, stride_2, 1],
padding='VALID')
# print("Pool Level {}: shape {}".format(n, pool.get_shape().as_list()))
pyramid.append(tf.reshape(pool, [shape[0], -1]))
spp_pool = tf.concat(1, pyramid)
return spp_pool
这篇关于如何实现定长空间金字塔池化层?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!