滑动窗口 - 如何获得窗口位置的图像? [英] Sliding window - how to get window location on image?

查看:501
本文介绍了滑动窗口 - 如何获得窗口位置的图像?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

参考python中的这个伟大的滑动窗口实现: https:// github.com/keepitsimple/ocrtest/blob/master/sliding_window.py#blob_contributors_box ,我的问题是 - 在代码中我实际上可以在图像上看到当前窗口的位置?
或者如何获取它的位置?



在第72行和第85行之后,我尝试打印 shape newstrides ,但我显然没有到这里。在 norm_shape 函数中,我打印出 tuple ,但输出只是窗口尺寸(如果我理解,



但我不仅需要尺寸,例如宽度和高度,还需要知道

解决方案

这可能更容易如果您尝试使用
flatten = False 在图像上创建窗口的网格,您就可以了解这种情况:

  import numpy as np 
从scipy.misc import lena
从matplotlib import pyplot as plt

img = lena()
print(img.shape)
#(512,512)

#在img上创建一个64x64像素的滑动窗口。
win = sliding_window(img,(64,64),shiftSize = None,flatten = False)

print(win.shape)
#(8,8,64, 64)
#ie(img_height / win_height,img_width / win_width,win_height,win_width)

plt.imshow(win [4,4,...])
plt。 draw()
#grid position [4,4]包含Lena的眼睛和鼻子

得到相应的像素坐标,你可以这样做:

  def get_win_pixel_coords(grid_pos,win_shape,shift_size = None):
如果shift_size为None:
shift_size = win_shape
gr,gc = grid_pos
sr,sc = shift_size
wr,wc = win_shape
top,bottom = gr * sr,(gr * sr)+ wr
left,right = gc * sc,(gc * sc)+ wc

返回顶部,底部,左侧,右侧

#检查网格位置[3,4]
t,b,l,r = get_win_pixel_coords((3,4),(64,64))

print .all(img [t:b,l:r] == win [3,4,:,:]))
#True

使用 flatten = True ,64x64像素窗口的8x8网格将被平铺成64x64的64长向量像素窗口。在这种情况下,
可以使用 np.unravel_index 来将1D向量索引
转换为网格索引的元组,然后使用这些获取像素坐标为
上面:

  win = sliding_window(img,(64,64),flatten = True )

grid_pos = np.unravel_index(12,(8,8))
t,b,l,r = get_win_pixel_coords(grid_pos,(64,64))

print(np.all(img [t:b,l:r] == win [12]))
#True





好,我将尝试解决您在评论中提出的一些问题。


我想要窗口的像素位置相对于实际像素尺寸的原始图像。


也许我不够清楚 - 你可以使用类似我的 get_win_pixel_coords()函数,它给你的顶部,底部,左右坐标的窗口相对于图像。例如:

  win = sliding_window(img,(64,64),shiftSize = None,flatten = False)

fig,(ax1,ax2)= plt.subplots(1,2)
ax1.hold(True)
ax1.imshow(win [4,4])
相对于此窗口,Lena眼睛的ax1.plot(8,9,'oy')#位置

t,b,l,r = get_win_pixel_coords((4,4),(64,64) )

ax2.hold(True)
ax2.imshow(img)
ax2.plot(t + 8,l + 9,'oy')Lena眼的位置,相对于整个图像

plt.show()

我更新了 get_win_pixel_coords()来处理 shiftSize 不是(即窗口不完全平铺图像没有重叠)。


所以我猜这种情况下,我应该让网格等于原始图像的尺寸,是吗? (而不是使用8x8)。


不,如果窗口平铺图像没有重叠(即 shiftSize =无,我到目前为止),那么如果你使网格维度等于图像的像素尺寸,每个窗口将只包含一个像素!


在我的例子中,对于宽度为360和高度为240的图片,这意味着我使用这行: grid_pos = np.unravel_index(* 12 *,(240,360))。此外,这行中的12是指什么?


如我所说,使'网格尺寸'等于图像尺寸是无意义的,因为每个窗口将仅包含单个像素(至少,假设窗口是不重叠的)。 12将引用到窗口的展平网格中的索引,例如:

  x = np.arange(25).reshape (5,5)#5x5网格包含从0 ... 24的数字
x_flat = x.ravel()#将其变成一个25长的向量
print(x_flat [12])#第12元素在扁平向量
#12
row,col = np.unravel_index(12,(5,5))#对应行/ col索引在x
print(x [row,col ])
#12




每个窗口,第一个滑动窗口从图像上的坐标0x0开始,第二个从10x10开始,等等,然后我希望程序不仅返回窗口内容,而且返回对应于每个窗口的坐标,即0,0 ,然后10,10,等等


正如我所说,你已经可以获得窗口相对于图像的位置由 get_win_pixel_coords()返回的顶,底,左,右坐标。如果你真的想要,可以将它包装到一个函数中:

  def get_pixels_and_coords(win_grid,grid_pos):
pix = win_grid [grid_pos]
tblr = get_win_pixel_coords(grid_pos,pix.shape)
return pix,tblr

#eg:
pix,tblr = get_pixels_and_coords ,(3,4))

如果你想要像素在窗口中,相对于图像,你可以使用的另一个技巧是构造包含图像中每个像素的行和列索引的数组,然后应用你的滑动窗口到这些:

  ridx,cidx = np.indices(img.shape)
r_win = sliding_window(ridx,(64,64),shiftSize = None,flatten = False)
c_win = sliding_window(cidx,(64,64),shiftSize = None,flatten = False)

pix = win [3,4]#像素值
r = r_win [ 3,4]#窗口中每个像素的行索引
c = c_win [3,4]#窗口中每个像素的列索引


Referring to this great sliding window implementation in python: https://github.com/keepitsimple/ocrtest/blob/master/sliding_window.py#blob_contributors_box, my question is - where in the code can I actually see the location of the current window on the image? Or how can I grab its location?

On lines 72 and after line 85, I tried printing out shape and newstrides, but I'm clearly not getting anywhere here. In the norm_shape function, I printed out tuple but the output was only the window dimensions (if I understood that right, too).

But I need not just the dimensions, such as width and height, I also need to know where exactly from the image a window is being extracted, in terms of the pixel coordinates, or which rows/columns in the image.

解决方案

It might be easier for you to understand what's going on if you try using flatten=False to create a 'grid' of windows onto the image:

import numpy as np
from scipy.misc import lena
from matplotlib import pyplot as plt

img = lena()
print(img.shape)
# (512, 512)

# make a 64x64 pixel sliding window on img. 
win = sliding_window(img, (64, 64), shiftSize=None, flatten=False)

print(win.shape)
# (8, 8, 64, 64)
# i.e. (img_height / win_height, img_width / win_width, win_height, win_width)

plt.imshow(win[4, 4, ...])
plt.draw()
# grid position [4, 4] contains Lena's eye and nose

To get the corresponding pixel coordinates, you could do something like this:

def get_win_pixel_coords(grid_pos, win_shape, shift_size=None):
    if shift_size is None:
        shift_size = win_shape
    gr, gc = grid_pos
    sr, sc = shift_size
    wr, wc = win_shape
    top, bottom = gr * sr, (gr * sr) + wr
    left, right = gc * sc, (gc * sc) + wc

    return top, bottom, left, right

# check for grid position [3, 4]
t, b, l, r = get_win_pixel_coords((3, 4), (64, 64))

print(np.all(img[t:b, l:r] == win[3, 4, :, :]))
# True

With flatten=True, the 8x8 grid of 64x64-pixel windows will just get flattened out into 64-long vector of 64x64-pixel windows. In that case you could use something like np.unravel_index to convert from the 1D vector index into a tuple of grid indices, then use these to get the pixel coordinates as above:

win = sliding_window(img, (64, 64), flatten=True)

grid_pos = np.unravel_index(12, (8, 8))
t, b, l, r = get_win_pixel_coords(grid_pos, (64, 64))

print(np.all(img[t:b, l:r] == win[12]))
# True


OK, I'll try and address some of the questions you raised in the comments.

I want the pixel location of the window relative to the actual pixel dimensions original image.

Perhaps I was not clear enough - you can already do this using something like my get_win_pixel_coords() function, which gives you the top, bottom, left and right coordinates of the window relative to the image. For example:

win = sliding_window(img, (64, 64), shiftSize=None, flatten=False)

fig, (ax1, ax2) = plt.subplots(1, 2)
ax1.hold(True)
ax1.imshow(win[4, 4])
ax1.plot(8, 9, 'oy')         # position of Lena's eye, relative to this window

t, b, l, r = get_win_pixel_coords((4, 4), (64, 64))

ax2.hold(True)
ax2.imshow(img)
ax2.plot(t + 8, l + 9, 'oy') # position of Lena's eye, relative to whole image

plt.show()

Also notice that I've updated get_win_pixel_coords() to deal with cases where shiftSize is not None (i.e the windows don't perfectly tile the image with no overlap).

So I'm guessing that in that case, I should just make the grid be equal to the original image's dimensions, is that right? (instead of using 8x8).

No, if the windows tile the image without overlap (i.e. shiftSize=None, which I've assumed so far), then if you made the grid dimensions equal to the pixel dimensions of the image, every window would just contain a single pixel!

So in my case, for an image of width: 360 and height: 240, would that mean I use this line: grid_pos = np.unravel_index(*12*, (240, 360)). Also, what does 12 refer to in this line?

As I said, making the 'grid size' equal to the image dimensions would be pointless, since every window would contain only a single pixel (at least, assuming that the windows are non-overlapping). The 12 would refer to the index into the flattened grid of windows, e.g.:

x = np.arange(25).reshape(5, 5)    # 5x5 grid containing numbers from 0 ... 24
x_flat = x.ravel()                 # flatten it into a 25-long vector
print(x_flat[12])                  # the 12th element in the flattened vector
# 12
row, col = np.unravel_index(12, (5, 5))  # corresponding row/col index in x
print(x[row, col])
# 12

I am shifting 10 pixels with each window, and the first sliding window starts from coordinates 0x0 on the image, and the second starts from 10x10, etc, then I want it the program to return not just the window contents but the coordinates corresponding to each window, i.e. 0,0, and then 10,10, etc

As I said, you can already get the position of the window relative to the image using the top, bottom, left, right coordinates returned by get_win_pixel_coords(). You could wrap this up into a single function if you really wanted:

def get_pixels_and_coords(win_grid, grid_pos):
    pix = win_grid[grid_pos]
    tblr = get_win_pixel_coords(grid_pos, pix.shape)
    return pix, tblr

# e.g.:
pix, tblr = get_pixels_and_coords(win, (3, 4))

If you want the coordinates of every pixel in the window, relative to the image, another trick you could use is to construct arrays containing the row and column indices of every pixel in the image, then apply your sliding window to these:

ridx, cidx = np.indices(img.shape)
r_win = sliding_window(ridx, (64, 64), shiftSize=None, flatten=False)
c_win = sliding_window(cidx, (64, 64), shiftSize=None, flatten=False)

pix = win[3, 4]    # pixel values
r = r_win[3, 4]    # row index of every pixel in the window
c = c_win[3, 4]    # column index of every pixel in the window

这篇关于滑动窗口 - 如何获得窗口位置的图像?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆