滚动窗口的数据框表示 [英] dataframe representation of a rolling window

查看:82
本文介绍了滚动窗口的数据框表示的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想要一个滚动窗口的数据框表示.我希望在一个数据帧中以另一个维度表示该窗口,而不是对滚动窗口执行某些操作.可以是pd.Panelnp.arraypd.DataFramepd.MultiIndex.

I want a dataframe representation of of a rolling window. Instead of performing some operation on a rolling window, I want a dataframe where the window is represented in another dimension. This could be as a pd.Panel or np.array or a pd.DataFrame with a pd.MultiIndex.

import pandas as pd
import numpy as np

np.random.seed([3,1415])
df = pd.DataFrame(np.random.rand(10, 3).round(2), 
                  columns=['A', 'B', 'C'],
                  index=list('abcdefghij'))

print df

      A     B     C
a  0.44  0.41  0.46
b  0.47  0.46  0.02
c  0.85  0.82  0.78
d  0.76  0.93  0.83
e  0.88  0.93  0.72
f  0.12  0.15  0.20
g  0.44  0.10  0.28
h  0.61  0.09  0.84
i  0.74  0.87  0.69
j  0.38  0.23  0.44

预期产量

对于window = 2,我希望结果是

Expected Output

For a window = 2 I'd expect the result to be.

      0                 1            
      A     B     C     A     B     C
a  0.44  0.41  0.46  0.47  0.46  0.02
b  0.47  0.46  0.02  0.85  0.82  0.78
c  0.85  0.82  0.78  0.76  0.93  0.83
d  0.76  0.93  0.83  0.88  0.93  0.72
e  0.88  0.93  0.72  0.12  0.15  0.20
f  0.12  0.15  0.20  0.44  0.10  0.28
g  0.44  0.10  0.28  0.61  0.09  0.84
h  0.61  0.09  0.84  0.74  0.87  0.69
i  0.74  0.87  0.69  0.38  0.23  0.44

我不确定是否以这种方式显示布局,但这是我想要的信息.我正在寻找最有效的方法.

I'm not determined to have the layout presented this way, but this is the information I want. I'm looking for the most efficient way to get at this.

我已经尝试过使用shift的各种方式,但是感觉很笨拙.这就是我用来产生以上输出的内容:

I've experimented with using shift in varying ways but it feels clunky. This is what I use to produce the output above:

print pd.concat([df, df.shift(-1)], axis=1, keys=[0, 1]).dropna()

推荐答案

我们可以使用NumPy以其深奥的

We could use NumPy to get views into those sliding windows with its esoteric strided tricks. If you are using this new dimension for some reduction like matrix-multiplication, this would be ideal. If for some reason, you want to have a 2D output, we need to use a reshape at the end, which will result in creating a copy though.

因此,实现看起来像这样-

Thus, the implementation would look something like this -

from numpy.lib.stride_tricks import as_strided as strided

def get_sliding_window(df, W, return2D=0):
    a = df.values                 
    s0,s1 = a.strides
    m,n = a.shape
    out = strided(a,shape=(m-W+1,W,n),strides=(s0,s0,s1))
    if return2D==1:
        return out.reshape(a.shape[0]-W+1,-1)
    else:
        return out

用于2D/3D输出的样本运行-

Sample run for 2D/3D output -

In [68]: df
Out[68]: 
      A     B
0  0.44  0.41
1  0.46  0.47
2  0.46  0.02
3  0.85  0.82
4  0.78  0.76

In [70]: get_sliding_window(df, 3,return2D=1)
Out[70]: 
array([[ 0.44,  0.41,  0.46,  0.47,  0.46,  0.02],
       [ 0.46,  0.47,  0.46,  0.02,  0.85,  0.82],
       [ 0.46,  0.02,  0.85,  0.82,  0.78,  0.76]])

这是3D视图输出的样子-

Here's how the 3D views output would look like -

In [69]: get_sliding_window(df, 3,return2D=0)
Out[69]: 
array([[[ 0.44,  0.41],
        [ 0.46,  0.47],
        [ 0.46,  0.02]],

       [[ 0.46,  0.47],
        [ 0.46,  0.02],
        [ 0.85,  0.82]],

       [[ 0.46,  0.02],
        [ 0.85,  0.82],
        [ 0.78,  0.76]]])

让我们来看看各种窗口大小的视图3D输出-

Let's time it for views 3D output for various window sizes -

In [331]: df = pd.DataFrame(np.random.rand(1000, 3).round(2))

In [332]: %timeit get_3d_shfted_array(df,2) # @Yakym Pirozhenko's soln
10000 loops, best of 3: 47.9 µs per loop

In [333]: %timeit get_sliding_window(df,2)
10000 loops, best of 3: 39.2 µs per loop

In [334]: %timeit get_3d_shfted_array(df,5) # @Yakym Pirozhenko's soln
10000 loops, best of 3: 89.9 µs per loop

In [335]: %timeit get_sliding_window(df,5)
10000 loops, best of 3: 39.4 µs per loop

In [336]: %timeit get_3d_shfted_array(df,15) # @Yakym Pirozhenko's soln
1000 loops, best of 3: 258 µs per loop

In [337]: %timeit get_sliding_window(df,15)
10000 loops, best of 3: 38.8 µs per loop

让我们确认我们确实在获得观看次数-

Let's verify that we are indeed getting views -

In [338]: np.may_share_memory(get_sliding_window(df,2), df.values)
Out[338]: True

即使在各种窗口大小下,使用get_sliding_window的时间几乎都是恒定的,这表明获取视图而不是复制具有巨大的优势.

The almost constant timings with get_sliding_window even across various window sizes suggest the huge benefit of getting the view instead of copying.

这篇关于滚动窗口的数据框表示的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆