REVISITED滚动窗口-将窗口滚动量作为参数添加-前进分析 [英] Rolling window REVISITED - Adding window rolling quantity as a parameter- Walk Forward Analysis

查看:74
本文介绍了REVISITED滚动窗口-将窗口滚动量作为参数添加-前进分析的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在网上搜索可以创建滚动窗口的方法,以便我可以以通用的方式对时间序列执行交叉验证技术,即前行分析".

I have been searching the web for methods that could create rolling windows so that I can perform a cross-validation technique known as Walk Forward Analysis for time series in a generalized manner.

但是,我还没有解决任何包含以下方面的灵活性的解决方案:1)窗口大小(几乎所有方法都具有此大小;例如,pandas

However, I have not get around to any solution that incorporates flexibility in terms of 1) the window size (almost all methods have this; for example, pandas rolling or a bit different np.roll) and 2) window rolling quantity, understood as how many indexes do we want to roll the window (i.e. haven't found any that incorporates this).

我一直在

I have been trying to optimize and make concise code, with the help of @coldspeed in this answer (I'm unable to comment there because I don't reach the needed reputation; hope to get there soon!), but I haven't been able to incorporate the window rolling quantity.

我的想法:

  1. 我尝试使用np.roll以及下面的示例,但没有成功.

  1. I have tried with np.roll together with my below example, with no success.

我还尝试过修改下面乘以ith值的代码,但是我没有使其适合列表理解,我想维护它.

I have also tried to modify the code below multiplying the ith value, but I don't get to fit it within the list comprehension, which I would like to maintain.

3.下面的示例适用于任何大小的窗口,但是,它只向前"滚动窗口一步,我希望可以将其推广到任何一步.

那么, ??有什么方法可以在列表理解方法中使用这两个参数?或者,¿还有其他我找不到的资源可以使此操作变得容易吗?非常感谢所有帮助.我的示例代码如下:

So, ¿is there any way to have this two parameters available within the list comprehension approach? or, ¿is there any other resource which I did not find that makes this easier? All the help is very much appreciated. My example code is the following:

In [1]: import numpy as np
In [2]: arr = np.random.random((10,3))

In [3]: arr

Out[3]: array([[0.38020065, 0.22656515, 0.25926935],
   [0.13446667, 0.04386083, 0.47210474],
   [0.4374763 , 0.20024762, 0.50494097],
   [0.49770835, 0.16381492, 0.6410294 ],
   [0.9711233 , 0.2004874 , 0.71186102],
   [0.61729025, 0.72601898, 0.18970222],
   [0.99308981, 0.80017134, 0.64955358],
   [0.46632326, 0.37341677, 0.49950571],
   [0.45753235, 0.55642914, 0.31972887],
   [0.4371343 , 0.08905587, 0.74511753]])

In [4]: inSamplePercentage = 0.4
In [5]: outSamplePercentage = 0.3 * inSamplePercentage

In [6]: windowSizeTrain = round(inSamplePercentage * arr.shape[0])
In [7]: windowSizeTest = round(outSamplePercentage * arr.shape[0])
In [8]: windowTrPlusTs = windowSizeTrain + windowSizeTest

In [9]: sliceListX = [arr[i: i + windowTrPlusTs] for i in range(len(arr) - (windowTrPlusTs-1))]

鉴于窗口长度为5,窗口滚动数量为2,我可以这样指定:

Given a window length of 5 and a window roll qty of 2, I could spec something like this:

Out [15]: 

[array([[0.38020065, 0.22656515, 0.25926935],
    [0.13446667, 0.04386083, 0.47210474],
    [0.4374763 , 0.20024762, 0.50494097],
    [0.49770835, 0.16381492, 0.6410294 ],
    [0.9711233 , 0.2004874 , 0.71186102]]),
 array([[0.4374763 , 0.20024762, 0.50494097],
    [0.49770835, 0.16381492, 0.6410294 ],
    [0.9711233 , 0.2004874 , 0.71186102],
    [0.61729025, 0.72601898, 0.18970222],
    [0.99308981, 0.80017134, 0.64955358]]),
 array([[0.9711233 , 0.2004874 , 0.71186102],
    [0.61729025, 0.72601898, 0.18970222],
    [0.99308981, 0.80017134, 0.64955358],
    [0.46632326, 0.37341677, 0.49950571],
    [0.45753235, 0.55642914, 0.31972887]]),
 array([[0.99308981, 0.80017134, 0.64955358],
   [0.46632326, 0.37341677, 0.49950571],
   [0.45753235, 0.55642914, 0.31972887],
   [0.4371343 , 0.08905587, 0.74511753]])]

(这合并了最后一个数组,尽管长度小于5).

(This incorporates the last array, although its lenght is less than 5).

OR:

Out [16]: 

[array([[0.38020065, 0.22656515, 0.25926935],
    [0.13446667, 0.04386083, 0.47210474],
    [0.4374763 , 0.20024762, 0.50494097],
    [0.49770835, 0.16381492, 0.6410294 ],
    [0.9711233 , 0.2004874 , 0.71186102]]),
 array([[0.4374763 , 0.20024762, 0.50494097],
    [0.49770835, 0.16381492, 0.6410294 ],
    [0.9711233 , 0.2004874 , 0.71186102],
    [0.61729025, 0.72601898, 0.18970222],
    [0.99308981, 0.80017134, 0.64955358]]),
 array([[0.9711233 , 0.2004874 , 0.71186102],
    [0.61729025, 0.72601898, 0.18970222],
    [0.99308981, 0.80017134, 0.64955358],
    [0.46632326, 0.37341677, 0.49950571],
    [0.45753235, 0.55642914, 0.31972887]])]

(只有长度== 5的数组->但是,这可以从上面的数组中得到一个简单的掩码).

(Only the arrays with lenght == 5 -> However, this could be derived from the one above with a simple mask).

忘了提及这也-可以做些事情如果熊猫滚动物体支持 iter 方法.

Forgot to mention this also -- Something could be done if pandas rolling objects support iter metho.

推荐答案

IIUC,您可以使用

IIUC what you want, you can use np.lib.stride_tricks.as_strided to create the view of the windows size and the rolling quantity such as:

#redefine arr to see better what is happening than with random numbers
arr = np.arange(30).reshape((10,3))
#get arr properties
arr_0, arr_1 = arr.shape
arr_is = arr.itemsize #the size of element in arr
#parameter window and rolling
win_size = 5
roll_qty = 2
# use as_stribed by defining the right parameters:
from numpy.lib.stride_tricks import as_strided
print (as_strided( arr, 
                   shape=(int((arr_0 - win_size)/roll_qty+1), win_size,arr_1),
                   strides=(roll_qty*arr_1*arr_is, arr_1*arr_is, arr_is)))

array([[[ 0,  1,  2],
        [ 3,  4,  5],
        [ 6,  7,  8],
        [ 9, 10, 11],
        [12, 13, 14]],

       [[ 6,  7,  8],
        [ 9, 10, 11],
        [12, 13, 14],
        [15, 16, 17],
        [18, 19, 20]],

       [[12, 13, 14],
        [15, 16, 17],
        [18, 19, 20],
        [21, 22, 23],
        [24, 25, 26]]])

以及其他窗口大小和滚动量:

and for another window size and rolling quantity:

win_size = 4
roll_qty = 3
print( as_strided( arr, 
                   shape=(int((arr_0 - win_size)/roll_qty+1), win_size,arr_1),
                   strides=(roll_qty*arr_1*arr_is, arr_1*arr_is, arr_is)))

array([[[ 0,  1,  2],
        [ 3,  4,  5],
        [ 6,  7,  8],
        [ 9, 10, 11]],

       [[ 9, 10, 11],
        [12, 13, 14],
        [15, 16, 17],
        [18, 19, 20]],

       [[18, 19, 20],
        [21, 22, 23],
        [24, 25, 26],
        [27, 28, 29]]])

这篇关于REVISITED滚动窗口-将窗口滚动量作为参数添加-前进分析的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆