不规则时间序列上的滚动回归 [英] Rolling regression on irregular time series

查看:126
本文介绍了不规则时间序列上的滚动回归的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要对不规则时间序列执行滚动回归(即间隔甚至可能不是周期性的,并且从0, 1, 2, 3......7, 20, 24, 28...),这是简单的数字,不一定需要日期/时间,但需要滚动窗口必须按时间.因此,如果我有一个不定期采样600秒的时间序列,并且窗口是30,则每30秒执行一次回归,每30个采样 not 执行一次回归.

I need to perform a rolling regression on an irregular time series (i.e. the interval may not even be periodic and go from 0, 1, 2, 3... to ...7, 20, 24, 28...) that's simple numeric and does not necessarily require date/time, but the rolling window needs be by time. So if I have a timeseries that is irregularly sampled for 600 seconds and the window is 30, the regression is performed every 30 seconds, and not every 30 samples.

我已经阅读了示例,尽管我可以按时间复制滚动总和和中位数,但似乎无法弄清楚它是否可以用于回归分析.

I've read examples, and while I could replicate doing rolling sums and medians by time, I can't seem to figure it out for regression.

首先,我还阅读了有关在不规则时间序列数据上执行滚动功能的其他一些问题,例如:不规则时间序列上的滚动窗口.

First of all, I have read some of the other questions with regards to performing rolling functions on irregular time series data, such as this: optimized rolling functions on irregular time series with time-based window, and this: Rolling window over irregular time series.

问题是,到目前为止,提供的示例对于summedian等式来说都很简单,但是我还没有弄清楚如何执行简单的滚动回归,即使用lm,即仍然基于窗口基于不规则时间序列的相同警告.而且,我的时间序列要简单得多.不需要日期,只是时间过去了".

The issue is that the examples provided, so far, are simple for equations like sum or median, but I have not yet figured out how to perform a simple rolling regression, i.e. using lm, that is still based on the same caveat that the window is based on an irregular time series. Also, my timeseries is much, much simpler; no date is necessary, it's simply time "elapsed".

无论如何,正确设置对我来说很重要,因为在不规则的时间(例如,跳过时间间隔)可能会高估或低估滚动回归中的系数,因为样本窗口将包含其他时间.

Anyway, getting this right is important to me because with irregular time - for example, a skip in the time interval - may give an over- or underestimate of the coefficients in the rolling regression, as the sample window will include additional time.

所以我想知道是否有人可以帮助我创建一个以最简单的方式做到这一点的功能?数据集基于测量随时间变化的变量,即2个变量:时间响应.时间是每 x 个时间单位(秒,分钟,因此不是日期/时间格式)进行测量,但有时会变得不规则.

So I was wondering if anyone can help me with creating a function that does this in the simplest way? The dataset is based on measuring a variable over time i.e. 2 variables: time, and response. Time is measured every x time elapsed units (seconds, minutes, so not date/time formatted), but once in a while it becomes irregular.

对于函数中的每一行,它应基于 n 个时间单位的宽度执行线性回归.宽度不得超过 n 个单位,但可以铺底(即减小)以适应不规则的时间采样.因此,例如,如果将宽度指定为20秒,但每6秒采样一次时间,则窗口将舍入为18,而不是24秒.

For every row in the function, it should perform a linear regression based on a width of n time units. The width should never exceed n units, but may be floored (i.e. reduced) to accomodate irregular time sampling. So for example, if the width is specified at 20 seconds, but time is sampled every 6 seconds, then the window will be rounded to 18, not 24 seconds.

我在这里查看了这个问题:

I have looked at the question here: How to calculate the average slope within a moving window in R, and I tested that code on an irregular time series, but it looks like it's based on regular time series.

样本数据:

sample <- 
structure(list(x = c(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 
13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 
29, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 47, 48, 
49), y = c(50, 49, 48, 47, 46, 47, 46, 45, 44, 43, 44, 43, 42, 
41, 40, 41, 40, 39, 38, 37, 38, 37, 36, 35, 34, 35, 34, 33, 32, 
31, 30, 29, 28, 29, 28, 27, 26, 25, 26, 25, 24, 23, 22, 21, 20, 
19)), .Names = c("x", "y"), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -46L))

我当前的代码(基于我之前提到的问题).我知道这不是按时间设置的.

My current code (based on a previous question I referred to). I know it's not subsetting by time:

library(zoo)
clm <- function(z) coef(lm(y ~ x, as.data.frame(z)))
rollme <- rollapplyr(zoo(sample), 10, clm, by.column = F, fill = NA)

预期的输出(手动计算)如下.输出不同于常规的滚动回归-时间间隔跳过29(秒)后,数字就会不同:

The expected output (manually calculated) is below. The output is different from a regular rolling regression -- the numbers are different as soon as the time interval skips at 29 (secs):

    NA
    NA
    NA
    NA
    NA
    NA
    NA
    NA
    NA
    -0.696969697
    -0.6
    -0.551515152
    -0.551515152
    -0.6
    -0.696969697
    -0.6
    -0.551515152
    -0.551515152
    -0.6
    -0.696969697
    -0.6
    -0.551515152
    -0.551515152
    -0.6
    -0.696969697
    -0.6
    -0.551515152
    -0.551515152
    -0.6
    -0.696969697
    -0.605042017
    -0.638888889
    -0.716981132
    -0.597560976
    -0.528301887
    -0.5
    -0.521008403
    -0.642857143
    -0.566666667
    -0.551515152
    -0.551515152
    -0.6
    -0.696969697
    -0.605042017
    -0.638888889
    -0.716981132

我希望我能提供足够的信息,但是让我知道(或在某个地方给我一个好的例子的指南)供我尝试吗?

I hope I'm providing enough information, but let me know (or give me a guide to a good example somewhere) for me to try this?

我尝试过的其他方法: 我已经尝试将时间转换为POSIXct格式,但是我不知道如何在其上执行lm:

Other things I have tried: I've tried converting the time to POSIXct format but I don't know how to perform lm on that:

require(lubridate)    
x <- as.POSIXct(strptime(sample$x, format = "%S"))

更新:添加了tldr部分.

Update : Added tldr section.

推荐答案

为完整起见,以下是使用 汇总为非等额联接.

For the sake of completeness, here is an answer which uses data.table to aggregate in a non-equi join.

尽管存在许多类似的问题,例如 r基于值(而不是行数或日期/时间变量)来计算带有窗口的滚动平均值),因为OP正在寻找滚动回归的系数,所以该问题本身值得回答.

Although there many similar questions, e.g., r calculating rolling average with window based on value (not number of rows or date/time variable), this question deserves an answer on its own as the OP is looking for the coefficients of a rolling regression.

library(data.table)
ws <- 10   # size of sliding window in time units
setDT(sample)[.(start = x - ws, end = x), on = .(x > start, x <= end),
              as.list(coef(lm(y ~ x.x))), by = .EACHI]

      x  x (Intercept)        x.x
 1: -10  0    50.00000         NA
 2:  -9  1    50.00000 -1.0000000
 3:  -8  2    50.00000 -1.0000000
 4:  -7  3    50.00000 -1.0000000
 5:  -6  4    50.00000 -1.0000000
 6:  -5  5    49.61905 -0.7142857
 7:  -4  6    49.50000 -0.6428571
 8:  -3  7    49.50000 -0.6428571
 9:  -2  8    49.55556 -0.6666667
10:  -1  9    49.63636 -0.6969697
11:   0 10    49.20000 -0.6000000
12:   1 11    48.88485 -0.5515152
13:   2 12    48.83636 -0.5515152
14:   3 13    49.20000 -0.6000000
15:   4 14    50.12121 -0.6969697
16:   5 15    49.20000 -0.6000000
17:   6 16    48.64242 -0.5515152
18:   7 17    48.59394 -0.5515152
19:   8 18    49.20000 -0.6000000
20:   9 19    50.60606 -0.6969697
21:  10 20    49.20000 -0.6000000
22:  11 21    48.40000 -0.5515152
23:  12 22    48.35152 -0.5515152
24:  13 23    49.20000 -0.6000000
25:  14 24    51.09091 -0.6969697
26:  15 25    49.20000 -0.6000000
27:  16 26    48.15758 -0.5515152
28:  17 27    48.10909 -0.5515152
29:  18 28    49.20000 -0.6000000
30:  19 29    51.57576 -0.6969697
31:  22 32    49.18487 -0.6050420
32:  23 33    50.13889 -0.6388889
33:  24 34    52.47170 -0.7169811
34:  25 35    48.97561 -0.5975610
35:  26 36    46.77358 -0.5283019
36:  27 37    45.75000 -0.5000000
37:  28 38    46.34454 -0.5210084
38:  29 39    50.57143 -0.6428571
39:  30 40    47.95556 -0.5666667
40:  31 41    47.43030 -0.5515152
41:  32 42    47.38182 -0.5515152
42:  33 43    49.20000 -0.6000000
43:  34 44    53.03030 -0.6969697
44:  37 47    49.26050 -0.6050420
45:  38 48    50.72222 -0.6388889
46:  39 49    54.22642 -0.7169811
      x  x (Intercept)        x.x

请注意,时间序列有规律间隔的第10至30行与OP的rollme相同.

Please note that rows 10 to 30 where the time series is regularly spaced are identical to OP's rollme.

as.list()的调用将强制将coef(lm(...))的结果显示在单独的列中.

The call to as.list() forces the result of coef(lm(...)) to appear in separate columns.

上面的代码使用右对齐的滚动窗口.但是,该代码也可以轻松地调整为支持左对齐窗口:

The code above uses a right aligned rolling window. However, the code can be easily adapted to support a left aligned window as well:

# left aligned window
setDT(sample)[.(start = x, end = x + ws), on = .(x >= start, x < end),
              as.list(coef(lm(y ~ x.x))), by = .EACHI]

这篇关于不规则时间序列上的滚动回归的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆