是否可以将"im2col"扩展为和"col2im"到N-D图像? [英] Is it possible to extend "im2col" and "col2im" to N-D images?

查看:85
本文介绍了是否可以将"im2col"扩展为和"col2im"到N-D图像?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

"Im2col"已经实现,实现MATLAB的im2col'在Python中滑动" ,对于Python中的2D图像有效.我想知道是否可以将其扩展到任意N-D图像?许多应用程序涉及高维数据(例如卷积,过滤,最大池化等).

"Im2col" has already been implemented, Implement MATLAB's im2col 'sliding' in Python, efficiently for 2-D images in Python. I was wondering whether it is possible to extend this to arbitrary N-D images? Many applications involve high-dimensional data (e.g. convolutions, filtering, max pooling, etc.).

推荐答案

因此,此问题的目的实际上只是公开发布我对此问题的解决方案.我似乎无法在Google上找到这样的解决方案,所以我决定亲自尝试一下.事实证明,从我在问题中引用的帖子中的方法#2"进行扩展实际上很简单!

So the purpose of this question was really just to post my solution to this problem publicly. I could not seem to find such a solution on Google, so I decided to take a stab at it myself. Turns out the implementation is actually quite simple to extend from "Approach #2" in the post referenced in my question!

有效实施N-D"im2col"

def im2col(im, win, strides = 1):
    # Dimensions
    ext_shp = tuple(np.subtract(im.shape, win) + 1)
    shp = tuple(win) + ext_shp
    strd = im.strides*2
    win_len = np.prod(win)
    try:
        len(strides)
    except:
        strides = [strides]*im.ndim
    strides = [min(i, s) for i, s in zip(im.shape, strides)]

    # Stack all possible patches as an N-D array using a strided view followed by reshaping
    col = np.lib.stride_tricks.as_strided(im, shape = shp, strides = strd).reshape(win_len, -1).reshape(-1, *ext_shp)

    # Extract patches with stride and reshape into columns
    slcs = tuple([slice(None, None, None)] + [slice(None, None, s) for s in strides])
    col = col[slcs].reshape(win_len, -1)

    return col

有效实施N-D"col2im"

def col2im(col, im_shp, win, strides = 1):
    # Dimensions
    try:
        len(strides)
    except:
        strides = [strides]*len(im_shp)
    strides = [min(i, s) for i, s in zip(im_shp, strides)]

    # Reshape columns into image
    if col.ndim > 1:
        im = col.reshape((-1, ) + tuple(np.subtract(im_shp, win)//np.array(strides) + 1))[0]
    else:
        im = col.reshape(tuple(np.subtract(im_shp, win)//np.array(strides) + 1))

    return im

有效的验证

让我们定义一个任意的3D输入:

Let's define an arbitrary 3-D input:

x = np.arange(216).reshape(6, 6, 6)
print(x)

[[[  0   1   2   3   4   5]
  [  6   7   8   9  10  11]
  [ 12  13  14  15  16  17]
  [ 18  19  20  21  22  23]
  [ 24  25  26  27  28  29]
  [ 30  31  32  33  34  35]]

 [[ 36  37  38  39  40  41]
  [ 42  43  44  45  46  47]
  [ 48  49  50  51  52  53]
  [ 54  55  56  57  58  59]
  [ 60  61  62  63  64  65]
  [ 66  67  68  69  70  71]]

 [[ 72  73  74  75  76  77]
  [ 78  79  80  81  82  83]
  [ 84  85  86  87  88  89]
  [ 90  91  92  93  94  95]
  [ 96  97  98  99 100 101]
  [102 103 104 105 106 107]]

 [[108 109 110 111 112 113]
  [114 115 116 117 118 119]
  [120 121 122 123 124 125]
  [126 127 128 129 130 131]
  [132 133 134 135 136 137]
  [138 139 140 141 142 143]]

 [[144 145 146 147 148 149]
  [150 151 152 153 154 155]
  [156 157 158 159 160 161]
  [162 163 164 165 166 167]
  [168 169 170 171 172 173]
  [174 175 176 177 178 179]]

 [[180 181 182 183 184 185]
  [186 187 188 189 190 191]
  [192 193 194 195 196 197]
  [198 199 200 201 202 203]
  [204 205 206 207 208 209]
  [210 211 212 213 214 215]]]

让我们以不均匀的窗口和相等的步幅提取所有补丁:

Let's extract all the patches with a non-uniform window and equal stride:

y = im2col(x, [1, 3, 2], strides = [1, 3, 2])
print(y.T) # transposed for ease of visualization

[[  0   1   6   7  12  13]
 [  2   3   8   9  14  15]
 [  4   5  10  11  16  17]
 [ 18  19  24  25  30  31]
 [ 20  21  26  27  32  33]
 [ 22  23  28  29  34  35]
 [ 36  37  42  43  48  49]
 [ 38  39  44  45  50  51]
 [ 40  41  46  47  52  53]
 [ 54  55  60  61  66  67]
 [ 56  57  62  63  68  69]
 [ 58  59  64  65  70  71]
 [ 72  73  78  79  84  85]
 [ 74  75  80  81  86  87]
 [ 76  77  82  83  88  89]
 [ 90  91  96  97 102 103]
 [ 92  93  98  99 104 105]
 [ 94  95 100 101 106 107]
 [108 109 114 115 120 121]
 [110 111 116 117 122 123]
 [112 113 118 119 124 125]
 [126 127 132 133 138 139]
 [128 129 134 135 140 141]
 [130 131 136 137 142 143]
 [144 145 150 151 156 157]
 [146 147 152 153 158 159]
 [148 149 154 155 160 161]
 [162 163 168 169 174 175]
 [164 165 170 171 176 177]
 [166 167 172 173 178 179]
 [180 181 186 187 192 193]
 [182 183 188 189 194 195]
 [184 185 190 191 196 197]
 [198 199 204 205 210 211]
 [200 201 206 207 212 213]
 [202 203 208 209 214 215]]

让我们将其转换回(降采样后的)图像:

Let's convert this back to a (downsampled) image:

z = col2im(y, x.shape, [1, 3, 2], strides = [1, 3, 2])
print(z)

[[[  0   2   4]
  [ 18  20  22]]

 [[ 36  38  40]
  [ 54  56  58]]

 [[ 72  74  76]
  [ 90  92  94]]

 [[108 110 112]
  [126 128 130]]

 [[144 146 148]
  [162 164 166]]

 [[180 182 184]
  [198 200 202]]]

如您所见,最终输出确实是我们期望的降采样图像(您可以通过逐个值地进行检查来轻松地进行检查).我选择的尺寸和步幅纯粹是说明性的.没有任何理由为什么窗口大小必须与步幅相同或不能超过3维.

As you can see, the final output is indeed the downsampled image that we expect (you can easily check this by going value by value). The dimensionality and strides I chose were purely illustrative. There's no reason why the window size has to be the same as your stride or that you can't go higher than 3 dimensions.

应用

如果要实际使用此功能,您要做的就是在将im2col的输出转换回图像之前对其进行拦截.例如,如果要进行合并,则可以在第0轴上取平均值或最大值.如果要进行卷积,只需将其乘以展平的卷积滤波器即可.

If you want to use this practically, all you have to do is intercept the output of im2col before turning it back into an image. For example, if you want to do pooling, you could take the mean or the maximum across the 0th axis. If you want to do a convolution, you just need to multiply this by your flattened convolutional filter.

在Tensorflow等的幕后,可能已经有比这更有效的替代方案,其速度比"im2col"还快.这并不是最有效的实现.当然,您可以通过消除"im2col"中的中间重塑步骤来进一步优化我的代码,但是这对我来说并不是立即显而易见的,因此我就把它保留了下来.如果您有更好的解决方案,请告诉我.无论如何,希望这可以帮助其他人寻找相同的答案!

There may be more efficient alternatives to this already implemented under the hood of Tensorflow, etc. that are faster than "im2col." This is not meant to be the MOST efficient implementation. And of course, you could possibly optimize my code further by eliminating the intermediate reshaping step in "im2col," but it wasn't immediately obvious to me so I just left it at that. If you have a better solution, let me know. Anyways, hope this helps someone else looking for the same answer!

这篇关于是否可以将"im2col"扩展为和"col2im"到N-D图像?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆