R中带有扩展窗口的滚动回归 [英] Rolling regression with expanding window in R

查看:74
本文介绍了R中带有扩展窗口的滚动回归的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在数据框中的两个变量之间使用扩展窗口进行滚动线性回归,按第三个分类列分组.

I would like to do a rolling linear regression, with expanding window, between two variables in a data frame, grouped by a third categorical column.

例如,在下面的玩具数据框中,我想使用所有行提取由 z 分组的 lm(y~x) 系数,直到感兴趣的行.因此,对于第 2 行,回归数据集将是 1:2 行,对于第 3 行将是 1:3 行,对于第 4 行将只是第 4 行,因为它是具有分类变量 z= b

For example, in the toy data frame below, I would like to extract coefficient of lm(y~x) grouped by z using all rows until the row of interest. Thus for row 2, data set for regression will be rows 1:2, for row 3 will be rows 1:3, for row 4 will be just row 4 as it is the first row with categorical variable z= b

dframe<-data.frame(x=c(1:10),y=c(8:17), z=c("a","a","a","b","b","b","b","b","b","b"))

使用 rollify 功能,除了扩展窗口之外,我可以获得我想要的东西.下面我使用的窗口大小为 2

Using rollify function, I am able to get what I want except the expanding window. Below I have used a window size of 2

rol <- rollify(~coef(lm(.x~0+.y)),2) 
output<-dframe %>%  group_by(z) %>% mutate(tt=rol(x,y))

具体来说,我不知道如何为 rollify 函数提供可变的窗口大小.可能吗?

Specifically I do not know, how I can supply a variable window size to the rollify function. Is it possible?

从广义上思考,进行此操作的有效方法是什么?我需要在 10000 行上执行此操作

Thinking broadly, what is an efficient way to do this operation? I need to do this on several 10000's of rows

推荐答案

1) rollapplyr 首先拆分 dframe,然后在拆分的每个组件上运行 rollapplyr.请注意,rollapplyr 可以将宽度向量作为第二个参数.

1) rollapplyr First split dframe and then run rollapplyr over each component of the split. Note that rollapplyr can take a vector of widths as the second argument.

library(zoo)

roll <- function(data, n = nrow(data)) {
  rollapplyr(1:n, 1:n, function(ix) coef(lm(y ~ x+0, data, subset = ix))[[1]])
}

L <- split(dframe[-3], dframe[[3]])
transform(dframe, roll = unlist(lapply(L, roll)))

给予:

    x  y z     roll
a1  1  8 a 8.000000
a2  2  9 a 5.200000
a3  3 10 a 4.000000
b1  4 11 b 2.750000
b2  5 12 b 2.536585
b3  6 13 b 2.363636
b4  7 14 b 2.222222
b5  8 15 b 2.105263
b6  9 16 b 2.007380
b7 10 17 b 1.924528

1a) 一种变体是使用 ave 而不是 split.

1a) A variation would be to use ave instead of split.

n <- nrow(dframe)
transform(dframe, roll = ave(1:n, z, FUN = function(ix) roll(dframe[ix, ]))

1b) 在最初回答问题一段时间后添加了此备选方案.

1b) This alternative has been added some time after the question was originally answered.

reg <- function(x) coef(lm(x[, 2] ~ x[, 1] + 0))
n <- nrow(dframe)
w <- ave(1:n, dframe$z, FUN = seq_along)
transform(dframe, 
  roll = rollapplyr(zoo(cbind(x, y)), w, reg, by.column = FALSE, coredata = FALSE))

2) dplyr/rollapplyr 除了我们使用 dplyr 进行分组之外,其他都是一样的.roll 来自 (1).

2) dplyr/rollapplyr This is the same except we use dplyr to do the grouping. roll is from (1).

library(dplyr)
library(zoo)

dframe %>%
  group_by(z) %>%
  mutate(roll = roll(data.frame(x, y))) %>%
  ungroup

给予:

# A tibble: 10 x 4
# Groups:   z [2]
       x     y z      roll
   <int> <int> <fct> <dbl>
 1     1     8 a      8   
 2     2     9 a      5.20
 3     3    10 a      4.00
 4     4    11 b      2.75
 5     5    12 b      2.54
 6     6    13 b      2.36
 7     7    14 b      2.22
 8     8    15 b      2.11
 9     9    16 b      2.01
10    10    17 b      1.92

3) Base R 这也可以在没有任何像这样的包的情况下完成,其中 L 来自 (1).结果类似于(1).

3) Base R This could also be done without any packages like this where L is from (1). The result is similar to (1).

transform(dframe, roll = unlist(lapply(L, function(data, n = nrow(data)) {
  sapply(1:n, function(i) coef(lm(y ~ x + 0, data, subset = 1:i))[[1]])
})))

3a) roll 在(1)中可以用 roll2 代替,下面不使用包,甚至不使用lm 为我们提供了另一个基本的 R 解决方案.同样,L 来自 (1).

3a) roll in (1) can be replaced with roll2 in the following which uses no packages and does not even use lm giving us another base R solution. Again, L is from (1).

roll2 <- function(data) with(data, cumsum(x * y) / cumsum(x * x))
transform(dframe, roll = unlist(lapply(L, roll2)))

这篇关于R中带有扩展窗口的滚动回归的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆