ConvNet中的col2im实施 [英] col2im implementation in ConvNet
问题描述
我正在尝试仅使用numpy实施 CNN .
I'm trying to implement a CNN only using numpy.
在进行反向传播时,我发现必须使用 col2im 来重塑 dx ,所以我从
While doing the backpropagation, I found out that I had to use col2im in order to reshape dx, so I checked the implementation from https://github.com/huyouare/CS231n/blob/master/assignment2/cs231n/im2col.py.
import numpy as np
def get_im2col_indices(x_shape, field_height, field_width, padding=1, stride=1):
# First figure out what the size of the output should be
N, C, H, W = x_shape
assert (H + 2 * padding - field_height) % stride == 0
assert (W + 2 * padding - field_height) % stride == 0
out_height = (H + 2 * padding - field_height) / stride + 1
out_width = (W + 2 * padding - field_width) / stride + 1
i0 = np.repeat(np.arange(field_height), field_width)
i0 = np.tile(i0, C)
i1 = stride * np.repeat(np.arange(out_height), out_width)
j0 = np.tile(np.arange(field_width), field_height * C)
j1 = stride * np.tile(np.arange(out_width), out_height)
i = i0.reshape(-1, 1) + i1.reshape(1, -1)
j = j0.reshape(-1, 1) + j1.reshape(1, -1)
k = np.repeat(np.arange(C), field_height * field_width).reshape(-1, 1)
return (k, i, j)
def im2col_indices(x, field_height, field_width, padding=1, stride=1):
""" An implementation of im2col based on some fancy indexing """
# Zero-pad the input
p = padding
x_padded = np.pad(x, ((0, 0), (0, 0), (p, p), (p, p)), mode='constant')
k, i, j = get_im2col_indices(x.shape, field_height, field_width, padding,
stride)
cols = x_padded[:, k, i, j]
C = x.shape[1]
cols = cols.transpose(1, 2, 0).reshape(field_height * field_width * C, -1)
return cols
def col2im_indices(cols, x_shape, field_height=3, field_width=3, padding=1,
stride=1):
""" An implementation of col2im based on fancy indexing and np.add.at """
N, C, H, W = x_shape
H_padded, W_padded = H + 2 * padding, W + 2 * padding
x_padded = np.zeros((N, C, H_padded, W_padded), dtype=cols.dtype)
k, i, j = get_im2col_indices(x_shape, field_height, field_width, padding,
stride)
cols_reshaped = cols.reshape(C * field_height * field_width, -1, N)
cols_reshaped = cols_reshaped.transpose(2, 0, 1)
np.add.at(x_padded, (slice(None), k, i, j), cols_reshaped)
if padding == 0:
return x_padded
return x_padded[:, :, padding:-padding, padding:-padding]
pass
我希望将 X 放入 im2col_indices ,并将该输出放回 col2im_indices 会返回相同的 X ,但是没有.
I expected when I put X into im2col_indices, and putting that output back to col2im_indices will return the same X, but it didn't.
我不明白col2im的实际作用.
I don't understand what col2im actually does.
推荐答案
如果我是正确的,则输出不是相同的X,因为X的每个单元格都转换为多个col
,并且在im2col_indices
.
If I'm right, the output is not the same X because each cell of X is converted to multiple col
s, and it's been multiplied during im2col_indices
.
假设您有一个简单的图像X
这样
Say you have a simple image X
like this
1 2 3
4 5 6
7 8 9
,然后将其转换为内核大小3,步幅1和same
填充,结果将是
and you convert it with kernel size 3, stride 1, and the same
padding, the result would be
0 0 0 0 1 2 0 4 5
0 0 0 1 2 3 4 5 6
0 0 0 2 3 0 5 6 0
0 1 2 0 4 5 0 7 8
1 2 3 4 5 6 7 8 9
2 3 0 5 6 0 8 9 0
0 4 5 0 7 8 0 0 0
4 5 6 7 8 9 0 0 0
5 6 0 8 9 0 0 0 0
* * * *
如您所见,第一个值为1的单元格出现在四个col
中:0、1、3、4.
as you can see, the first cell with value 1 shows up in four col
s: 0, 1, 3, 4.
im2col_indices
首先将零初始化具有填充大小的图像,然后将每个col
添加到其中.专注于第一个单元格,过程应该像
im2col_indices
first zero initialize a image with padded size, and then add each col
to it. Focus on the first cell, the process should be like
1.zero初始化图片
1.zero initialized image
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
2.add col 0
0 0 0 0 0 0 0 0 - - 0 0 0 0 0
0 0 0 0 0 0 1 2 - - 0 1 2 0 0
0 0 0 0 0 + 0 4 5 - - = 0 4 5 0 0
0 0 0 0 0 - - - - - 0 0 0 0 0
0 0 0 0 0 - - - - - 0 0 0 0 0
3.add col 1
0 0 0 0 0 - 0 0 0 - 0 0 0 0 0
0 1 2 0 0 - 1 2 3 - 0 2 4 3 0
0 4 5 0 0 + - 4 5 6 - = 0 8 10 6 0
0 0 0 0 0 - - - - - 0 0 0 0 0
0 0 0 0 0 - - - - - 0 0 0 0 0
4.add col 3
0 0 0 0 0 - - - - - 0 0 0 0 0
0 2 4 3 0 0 1 2 - - 0 3 6 3 0
0 8 10 6 0 + 0 4 5 - - = 0 12 15 6 0
0 0 0 0 0 0 7 8 - - 0 7 8 0 0
0 0 0 0 0 - - - - - 0 0 0 0 0
5.add col 4
0 0 0 0 0 - - - - - 0 0 0 0 0
0 3 6 3 0 - 1 2 3 - 0 4 8 6 0
0 12 15 6 0 + - 4 5 6 - = 0 16 20 12 0
0 7 8 0 0 - 7 8 9 - 0 14 16 9 0
0 0 0 0 0 - - - - - 0 0 0 0 0
转换回时,第一个单元格乘以4.对于这张简单的图片,col2im_indices(im2col_indices(X))
应该给您
The first cell is multiplied by 4 when converted back. For this simple image, col2im_indices(im2col_indices(X))
should give you
4 12 12
24 45 36
28 48 36
与原始图像相比,四个角单元1 3 7 9
乘以4,四个边缘单元2 4 6 8
乘以6,而中心单元5
乘以9.
Comparing to the original image, the four corner cells 1 3 7 9
are multiplied by 4, the four edge cells 2 4 6 8
are multiplied by 6 and the center cell 5
is multiplied by 9.
对于大图像,大多数单元将乘以9,我认为这大致意味着您的学习率实际上比您想象的大9倍.
For large images, most of the cells will be multiplied by 9 and I think it roughly means your learning rate is actually 9 times larger than you think.
这篇关于ConvNet中的col2im实施的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!