使用跨度填充numpy滚动窗口操作 [英] padding numpy rolling window operations using strides

查看:175
本文介绍了使用跨度填充numpy滚动窗口操作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个函数f,我想在滑动窗口中有效地进行计算.

I have a function f that I would like to efficiently compute in a sliding window.

def efficient_f(x):
   # do stuff
   wSize=50
   return another_f(rolling_window_using_strides(x, wSize), -1)

我在SO上看到使用跨步执行此操作特别有效: 从numpy.lib.stride_tricks导入as_strided

I have seen on SO that is particularly efficient to do that using strides: from numpy.lib.stride_tricks import as_strided

def rolling_window_using_strides(a, window):
    shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
    strides = a.strides + (a.strides[-1],)
    print np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides).shape
    return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides) 

然后我尝试将其应用于df:

Then I try to apply it on a df:

df=pd.DataFrame(data=np.random.rand(180000,1),columns=['foo'])
df['bar']=df[['foo']].apply(efficient_f,raw=True)
# note the double [[, otherwise pd.Series.apply
# (not accepting raw, and axis kwargs) will be called instead of pd.DataFrame.

它运行得非常好,并且确实带来了显着的性能提升. 但是,我仍然收到以下错误:

It is working very nicely, and it indeed led to significant performance gains. However, I still get the following error:

ValueError: Shape of passed values is (1, 179951), indices imply (1, 180000).

这是因为我正在使用wSize = 50,它会产生

This is because I am using wSize=50, which yields

rolling_window_using_strides(df['foo'].values,50).shape
(1L, 179951L, 50L)

有没有一种方法可以通过零/np.nan边界填充来获得

Is there a way by zero/np.nan padding at the borders to get

(1L, 180000, 50L)

因此具有与原始矢量相同的大小

hence same size as the original vector

推荐答案

这是使用 np.lib.stride_tricks.as_strided -

def strided_axis0(a, fillval, L): # a is 1D array
    a_ext = np.concatenate(( np.full(L-1,fillval) ,a))
    n = a_ext.strides[0]
    strided = np.lib.stride_tricks.as_strided     
    return strided(a_ext, shape=(a.shape[0],L), strides=(n,n))

样品运行-

In [95]: np.random.seed(0)

In [96]: a = np.random.rand(8,1)

In [97]: a
Out[97]: 
array([[ 0.55],
       [ 0.72],
       [ 0.6 ],
       [ 0.54],
       [ 0.42],
       [ 0.65],
       [ 0.44],
       [ 0.89]])

In [98]: strided_axis0(a[:,0], fillval=np.nan, L=3)
Out[98]: 
array([[  nan,   nan,  0.55],
       [  nan,  0.55,  0.72],
       [ 0.55,  0.72,  0.6 ],
       [ 0.72,  0.6 ,  0.54],
       [ 0.6 ,  0.54,  0.42],
       [ 0.54,  0.42,  0.65],
       [ 0.42,  0.65,  0.44],
       [ 0.65,  0.44,  0.89]])

这篇关于使用跨度填充numpy滚动窗口操作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆