Python-向量化滑动窗口 [英] Python - vectorizing a sliding window

查看:157
本文介绍了Python-向量化滑动窗口的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试对滑动窗口操作进行矢量化处理.对于一维情况,一个有用的示例可以遵循以下原则:

I'm trying to vectorize a sliding window operation. For the 1-d case a helpful example could go along the lines of:

x= vstack((np.array([range(10)]),np.array([range(10)])))

x[1,:]=np.where((x[0,:]<5)&(x[0,:]>0),x[1,x[0,:]+1],x[1,:])

索引<5的每个当前值的n + 1值.但是我得到这个错误:

The n+1 value for each current value for indices <5. But I get this error:

x[1,:]=np.where((x[0,:]<2)&(x[0,:]>0),x[1,x[0,:]+1],x[1,:])
IndexError: index (10) out of range (0<=index<9) in dimension 1

奇怪的是,对于n-1值,我不会收到此错误,这意味着索引小于0.这似乎并不在意:

Curiously I wouldn't get this error for the n-1 value which would mean indices smaller than 0. It doesn't seem to mind:

x[1,:]=np.where((x[0,:]<5)&(x[0,:]>0),x[1,x[0,:]-1],x[1,:])

print(x)

[[0 1 2 3 4 5 6 7 8 9]
 [0 0 1 2 3 5 6 7 8 9]]

这附近还有吗?我的方法是完全错误的吗?任何评论将不胜感激.

Is there anyway around this? is my approach totally wrong? any comments would be appreciated.

这是我要实现的目标,我将矩阵展平为一个numpy数组,在该数组上我想计算每个单元格6x6邻域的平均值:

This is what I would like to achieve, I flatten a matrix to an numpy array on which I want to calculate the mean of the 6x6 neighborhood of each cell:

matriz = np.array([[1,2,3,4,5],
   [6,5,4,3,2],
   [1,1,2,2,3],
   [3,3,2,2,1],
   [3,2,1,3,2],
   [1,2,3,1,2]])

# matrix to vector
vector2 = ndarray.flatten(matriz)

ncols = int(shape(matriz)[1])
nrows = int(shape(matriz)[0])

vector = np.zeros(nrows*ncols,dtype='float64')


# Interior pixels
if ( (i % ncols) != 0 and (i+1) % ncols != 0 and i>ncols and i<ncols*(nrows-1)):

    vector[i] = np.mean(np.array([vector2[i-ncols-1],vector2[i-ncols],vector2[i-ncols+1],vector2[i-1],vector2[i+1],vector2[i+ncols-1],vector2[i+ncols],vector2[i+ncols+1]]))

推荐答案

如果我正确地理解了这个问题,那么您希望对所有数字取均值,在索引周围增加1步,而忽略索引.

If I understand the problem correctly you would like to take the mean of all numbers 1 step around the index, neglecting the index.

我已经修补了您的功能以使其正常工作,我相信您正在尝试以下操作:

I have patched your function to work, I believe you were going for something like this:

def original(matriz):

    vector2 = np.ndarray.flatten(matriz)

    nrows, ncols= matriz.shape
    vector = np.zeros(nrows*ncols,dtype='float64')

    # Interior pixels
    for i in range(vector.shape[0]):
        if ( (i % ncols) != 0 and (i+1) % ncols != 0 and i>ncols and i<ncols*(nrows-1)):

            vector[i] = np.mean(np.array([vector2[i-ncols-1],vector2[i-ncols],\
                        vector2[i-ncols+1],vector2[i-1],vector2[i+1],\
                        vector2[i+ncols-1],vector2[i+ncols],vector2[i+ncols+1]]))

我使用切片和视图将其重写:

I rewrote this using using slicing and views:

def mean_around(arr):
    arr=arr.astype(np.float64)

    out= np.copy(arr[:-2,:-2])  #Top left corner
    out+= arr[:-2,2:]           #Top right corner
    out+= arr[:-2,1:-1]         #Top center
    out+= arr[2:,:-2]           #etc
    out+= arr[2:,2:]
    out+= arr[2:,1:-1]
    out+= arr[1:-1,2:]
    out+= arr[1:-1,:-2]

    out/=8.0    #Divide by # of elements to obtain mean

    cout=np.empty_like(arr)  #Create output array
    cout[1:-1,1:-1]=out      #Fill with out values
    cout[0,:]=0;cout[-1,:]=0;cout[:,0]=0;cout[:,-1]=0 #Set edges equal to zero

    return  cout

使用np.empty_like,然后填充边缘似乎比np.zeros_like快一点.首先,请仔细检查它们是否使用您的matriz数组.

Using np.empty_like and then filling the edges seemed slightly faster then np.zeros_like. First lets double check they give the same thing using your matriz array.

print np.allclose(mean_around(matriz),original(matriz))
True

print mean_around(matriz)
[[ 0.     0.     0.     0.     0.   ]
 [ 0.     2.5    2.75   3.125  0.   ]
 [ 0.     3.25   2.75   2.375  0.   ]
 [ 0.     1.875  2.     2.     0.   ]
 [ 0.     2.25   2.25   1.75   0.   ]
 [ 0.     0.     0.     0.     0.   ]]

一些时间:

a=np.random.rand(500,500)

print np.allclose(original(a),mean_around(a))
True

%timeit mean_around(a)
100 loops, best of 3: 4.4 ms per loop

%timeit original(a)
1 loops, best of 3: 6.6 s per loop

大约是1500倍的加速速度.

Roughly ~1500x speedup.

看起来是使用numba的好地方:

Looks like a good place to use numba:

def mean_numba(arr):
    out=np.zeros_like(arr)
    col,rows=arr.shape

    for x in xrange(1,col-1):
        for y in xrange(1,rows-1):
            out[x,y]=(arr[x-1,y+1]+arr[x-1,y]+arr[x-1,y-1]+arr[x,y+1]+\
                      arr[x,y-1]+arr[x+1,y+1]+arr[x+1,y]+arr[x+1,y-1])/8.
    return out

nmean= autojit(mean_numba)

现在让我们与所有提出的方法进行比较.

Now lets compare against all presented methods.

a=np.random.rand(5000,5000)

%timeit mean_around(a)
1 loops, best of 3: 729 ms per loop

%timeit nmean(a)
10 loops, best of 3: 169 ms per loop

#CT Zhu's answer
%timeit it_mean(a)
1 loops, best of 3: 36.7 s per loop

#Ali_m's answer
%timeit fast_local_mean(a,(3,3))
1 loops, best of 3: 4.7 s per loop

#lmjohns3's answer
%timeit scipy_conv(a)
1 loops, best of 3: 3.72 s per loop

使用numba的速度提高了4倍,这是很正常的,这表明numpy代码与要获得的性能差不多.尽管确实必须更改@CTZhu的答案以包括不同的数组大小,但我还是按了拉出的其他代码.

A 4x speed with numba up is pretty nominal indicating that the numpy code is about as good as its going to get. I pulled the other codes as presented, although I did have to change @CTZhu's answer to include different array sizes.

这篇关于Python-向量化滑动窗口的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆