在tidyverse中按组滚动回归？ [英] rolling regression by group in the tidyverse?

查看：101 发布时间：2020/6/7 18:41:57 r dplyr purrr broom rolling-computation

本文介绍了在tidyverse中按组滚动回归？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

关于R中的滚动回归有很多问题，但是我在这里专门寻找使用 dplyr ，扫帚和（如果需要） purrr 。


There are many questions about rolling regression in R, but here I am specifically looking for something that uses dplyr, broom and (if needed) purrr. 
这就是使这个问题与众不同的原因。我想保持 tidyverse 的一致性。是否可以使用整洁的工具（例如 purrr：map 和 dplyr ）进行适当的运行回归？
This is what makes this question different. I want to be tidyverse consistent. Is is possible to do a proper running regression with tidy tools such as purrr:map and dplyr?
请考虑以下简单示例：
library(dplyr)
library(purrr)
library(broom)
library(zoo)
library(lubridate)

mydata = data_frame('group' = c('a','a', 'a','a','b', 'b', 'b', 'b'),
                     'y' = c(1,2,3,4,2,3,4,5),
                     'x' = c(2,4,6,8,6,9,12,15),
                     'date' = c(ymd('2016-06-01', '2016-06-02', '2016-06-03', '2016-06-04',
                                    '2016-06-03', '2016-06-04', '2016-06-05','2016-06-06')))

  group     y     x date      
  <chr> <dbl> <dbl> <date>    
1 a      1.00  2.00 2016-06-01
2 a      2.00  4.00 2016-06-02
3 a      3.00  6.00 2016-06-03
4 a      4.00  8.00 2016-06-04
5 b      2.00  6.00 2016-06-03
6 b      3.00  9.00 2016-06-04
7 b      4.00 12.0  2016-06-05
8 b      5.00 15.0  2016-06-06

对于每个组（在此示例中， a 或 b ）：
For each group (in this example, a or b):
 
 计算滚动在最近2次观察中对 x  的 y 进行回归。 li> 
 
将该滚动回归的系数存储在数据框的一列中。 
 
 

compute the rolling regression of y on x over the last 2 observations.
store the coefficient of that rolling regression in a column of the dataframe. 

当然，如您所见，滚动回归只能针对每个组的最后两行进行计算。
Of course, as you can see, the rolling regression can only be computed for the last 2 rows in each group.
我尝试使用以下内容，但未成功。 
I have tried to use the following, but without success. 
data %>% group_by(group) %>% 
  mutate(rolling_coef = do(tidy(rollapply(. ,
                    width=2, 
                    FUN = function(df) {t = lm(formula=y ~ x, 
                                              data = as.data.frame(df), 
                                              na.rm=TRUE); 
                    return(t$coef) },
                    by.column=FALSE, align="right"))))
Error in mutate_impl(.data, dots) : 
  Evaluation error: subscript out of bounds.
In addition: There were 21 warnings (use warnings() to see them)

任何
第一个 a 组的最后两行的预期输出为0.5和0.5（确实存在在此示例中， y 和 x 之间是完美的线性相关）
Expected output for the last two rows of the first a group is 0.5 and 0.5 (there is indeed a perfect linear correlation between y and x in this example)
更具体地说：
mydata_1 <- mydata %>% filter(group == 'a',
                  row_number() %in% c(1,2))
# A tibble: 2 x 3
  group     y     x
  <chr> <dbl> <dbl>
1 a      1.00  2.00
2 a      2.00  4.00
> tidy(lm(y ~ x, mydata_1))['estimate'][2,]
[1] 0.5

以及
mydata_2 <- mydata %>% filter(group == 'a',
                              row_number() %in% c(2,3)) 
# A tibble: 2 x 3
  group     y     x
  <chr> <dbl> <dbl>
1 a      2.00  4.00
2 a      3.00  6.00
> tidy(lm(y ~ x, mydata_2))['estimate'][2,]
[1] 0.5

 编辑： 
在此具有置信区间（tidyverse）的滚动回归 
推荐答案
定义函数 Coef ，其参数由 cbind（y，x）形成，并用截距使x上的y回归，返回系数。然后使用每个组的当前行和先前行应用 rollapplyr 。如果按 last 表示当前行的前2行，即排除当前行，则将2替换为 list（-seq（2））作为 rollapplyr 的参数。
Define a function Coef whose argument is formed from cbind(y, x) and which regresses y on x with an intercept, returning the coefficients.  Then apply rollapplyr using the current and prior rows over each group.  If by last you meant the 2 prior rows to the current row, i.e. exclude the current row, then replace 2 with list(-seq(2)) as an argument to rollapplyr.
Coef <- . %>% as.data.frame %>% lm %>% coef

mydata %>% 
  group_by(group) %>% 
  do(cbind(reg_col = select(., y, x) %>% rollapplyr(2, Coef, by.column = FALSE, fill = NA),
           date_col = select(., date))) %>%
  ungroup

给予：
# A tibble: 8 x 4
  group `reg_col.(Intercept)` reg_col.x date      
  <chr>                 <dbl>     <dbl> <date>    
1 a      NA                      NA     2016-06-01
2 a       0                       0.500 2016-06-02
3 a       0                       0.500 2016-06-03
4 a       0                       0.500 2016-06-04
5 b      NA                      NA     2016-06-03
6 b       0.00000000000000126     0.333 2016-06-04
7 b     - 0.00000000000000251     0.333 2016-06-05
8 b       0                       0.333 2016-06-06

 
 
 
变化
 
 
 上面的变体为：


Variation

A variation of the above would be:
mydata %>% 
       group_by(group) %>% 
       do(select(., date, y, x) %>% 
          read.zoo %>% 
          rollapplyr(2, Coef, by.column = FALSE, fill = NA) %>%
          fortify.zoo(names = "date")
       ) %>% 
       ungroup

 
 
 
仅坡度
 
 
 如果仅需要坡度，则可以进一步简化。我们使用斜率等于 cov（x，y）/ var（x）的事实。
slope <- . %>% { cov(.[, 2], .[, 1]) / var(.[, 2])}
mydata %>%
       group_by(group) %>%
       mutate(slope = rollapplyr(cbind(y, x), 2, slope, by.column = FALSE, fill = NA)) %>%
       ungroup


                        这篇关于在tidyverse中按组滚动回归？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

在tidyverse中按组滚动回归？ [英] rolling regression by group in the tidyverse?

问题描述

推荐答案

变化

Variation

仅坡度

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在tidyverse中按组滚动回归？ [英] rolling regression by group in the tidyverse?

问题描述

推荐答案

变化

Variation

仅坡度

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭