在移动窗口中仅具有先前值的线性回归 [英] Linear regression with only previous values in moving window

查看:30
本文介绍了在移动窗口中仅具有先前值的线性回归的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个庞大的数据集,并希望在60的窗口内进行滚动线性回归.但是,我希望线性回归只考虑60个以前的值.

I have a huge dataset and would like to perform a rolling linear regression over a window of 60. However, I want that only the 60 previous values are considered for the linear regression.

我的数据框DF由以下列组成:

My Dataframe DF consists of following Columns:

Date          Company   Y     X1   X2
01.01.2015    Mill     0.13   -1    -3
01.02.2015    Mill     0.16   1    5 
01.03.2015    Mill     0.83   3    4
01.04.2015    Mill     -0.83  23   4
01.01.1988    Hall    0.23    1    3
01.02.1988    Hall    0.24    23   2
01.03.1988    Hall    0.78    19   -9
01.04.1988    Hall    0.73    4    12
01.05.1988    Hall    0.72    5    12
01.11.2008    Jopo    0.12    0.9  32
01.12.2008    Jopo    0.13    10   32
01.01.2009    Jopo    0.32    0.2  10
01.02.2009    Jopo    0.32    2    -1

我有数千家公司,每个公司都有数月的数据.必须对公司的每个月进行回归,并使用该特定公司前60个月的滚动窗口.

I have several thousand companies and data for several months for each company. The regression has to be done for every month of a company, with the rolling window of 60 previous months of this specific company.

在给定的示例中,假设滚动窗口为3,我希望Mill公司在01.04.2015上使用01.01-01.03-2015的数据进行回归.对于Hall公司,我希望在01.04和01.05.1988上进行回归,对于Jopo,我希望在01.02.2009上进行回归.

In the given example, assuming only a rolling window of 3, I want for company Mill a regression on 01.04.2015 with the data from 01.01-01.03-2015. For company Hall I want regressions on 01.04 and 01.05.1988, and for Jopo I want a regression on 01.02.2009.

理想情况下,结果将与公司"和日期"一起粘贴到新的数据框中,因为我必须继续使用此数据并必须对其进行更多分析.

Ideally, the results will be pasted together with Company and Date in a new data frame, as I have to keep working with this data and have to analyse it more.

以下代码应该可以解决滚动回归问题,但是它不使用前60个日期,而是使用59个日期,并且还包括当前日期:

Following code should do the trick for the rolling regression, however it does not use the previous 60 dates, but 59 and includes the current date too:

图书馆(动物园)

rolled <- function(df) {                                    
    rollapply(df, width = 60,
        FUN = function(z) coef(lm(Y ~ X1+X2, data = as.data.frame(z))),
        by.column = FALSE, align = "right"
)
}    

下面的代码根据公司名称进行回归,因为我想对每个单独的公司进行回归,从而独立于其他公司.

Following code does the regression dependent on the company name, as I want to make regressions for each individual company, independend from the other companies.

Test <- do.call("rbind", by(DF[c("Y", "X1", "X2")], DF[c( "Name")], rolled))

如何合并,仅将60个以前的值用于回归?也许有人知道如何在结果中同时显示公司"和日期"?感谢您的帮助!

How do I incorporate, that only the 60 previous values are used for the regression? And maybe someone knows how to show also "Company" and "Date" in the results? Thanks for your help!

推荐答案

假设 DF 如结尾处的注释中可重复地给出.使用 by DF 拆分为公司行,并使用 rollapplyr 应用匿名功能.请注意, rollapplyr 可以为 width 使用一个列表参数,该列表参数包含要使用的位置的偏移量.例如, list(-seq(3))表示使用前面的3行(如问题所建议),而不使用当前行(其位置为0).

Assume DF is as given reproducibly in the Note at the end. Use by to split DF into company rows and apply the anonymous function using rollapplyr. Note that rollapplyr can take for the width a list argument with the offsets of the positions to use. For example, list(-seq(3)) means use the 3 prior rows (as suggested in the question) but not the current row (which would have position 0).

library(zoo)

# w <- 60    
w <- 3
Coef <- function(x) coef(lm(as.data.frame(x)))
do.call("rbind", by(DF, DF$Company, function(x) 
    cbind(x, rollapplyr(x[3:5], list(-seq(w)), Coef, fill = NA, by.column = FALSE))))

给予:

              Date Company     Y   X1 X2 (Intercept)         X1         X2
Hall.5  01.01.1988    Hall  0.23  1.0  3          NA         NA         NA
Hall.6  01.02.1988    Hall  0.24 23.0  2          NA         NA         NA
Hall.7  01.03.1988    Hall  0.78 19.0 -9          NA         NA         NA
Hall.8  01.04.1988    Hall  0.73  4.0 12     0.37711 -0.0017480 -0.0484553
Hall.9  01.05.1988    Hall  0.72  5.0 12     1.30333 -0.0433333 -0.0333333
Jopo.10 01.11.2008    Jopo  0.12  0.9 32          NA         NA         NA
Jopo.11 01.12.2008    Jopo  0.13 10.0 32          NA         NA         NA
Jopo.12 01.01.2009    Jopo  0.32  0.2 10          NA         NA         NA
Jopo.13 01.02.2009    Jopo  0.32  2.0 -1     0.41104  0.0010989 -0.0091259
Mill.1  01.01.2015    Mill  0.13 -1.0 -3          NA         NA         NA
Mill.2  01.02.2015    Mill  0.16  1.0  5          NA         NA         NA
Mill.3  01.03.2015    Mill  0.83  3.0  4          NA         NA         NA
Mill.4  01.04.2015    Mill -0.83 23.0  4     0.21611  0.2994444 -0.0711111

您也可以尝试以下方法:

You could also try this:

library(broom)
fun <- function(x) unlist(tidy(lm(as.data.frame(x)))[, -1]) 
do.call("rbind", by(DF, DF$Company, function(x) 
 cbind(x, rollapplyr(x[3:5], list(-(seq(w))), fun, fill = NA, by.column = FALSE))))

给出:

              Date Company     Y   X1 X2 estimate1    estimate2    estimate3
Hall.5  01.01.1988    Hall  0.23  1.0  3        NA           NA           NA
Hall.6  01.02.1988    Hall  0.24 23.0  2        NA           NA           NA
Hall.7  01.03.1988    Hall  0.78 19.0 -9        NA           NA           NA
Hall.8  01.04.1988    Hall  0.73  4.0 12 0.3771138 -0.001747967 -0.048455285
Hall.9  01.05.1988    Hall  0.72  5.0 12 1.3033333 -0.043333333 -0.033333333
Jopo.10 01.11.2008    Jopo  0.12  0.9 32        NA           NA           NA
Jopo.11 01.12.2008    Jopo  0.13 10.0 32        NA           NA           NA
Jopo.12 01.01.2009    Jopo  0.32  0.2 10        NA           NA           NA
Jopo.13 01.02.2009    Jopo  0.32  2.0 -1 0.4110390  0.001098901 -0.009125874
Mill.1  01.01.2015    Mill  0.13 -1.0 -3        NA           NA           NA
Mill.2  01.02.2015    Mill  0.16  1.0  5        NA           NA           NA
Mill.3  01.03.2015    Mill  0.83  3.0  4        NA           NA           NA
Mill.4  01.04.2015    Mill -0.83 23.0  4 0.2161111  0.299444444 -0.071111111
        std.error1 std.error2 std.error3 statistic1 statistic2 statistic3
Hall.5          NA         NA         NA         NA         NA         NA
Hall.6          NA         NA         NA         NA         NA         NA
Hall.7          NA         NA         NA         NA         NA         NA
Hall.8         NaN        NaN        NaN        NaN        NaN        NaN
Hall.9         NaN        NaN        NaN        NaN        NaN        NaN
Jopo.10         NA         NA         NA         NA         NA         NA
Jopo.11         NA         NA         NA         NA         NA         NA
Jopo.12         NA         NA         NA         NA         NA         NA
Jopo.13        NaN        NaN        NaN        NaN        NaN        NaN
Mill.1          NA         NA         NA         NA         NA         NA
Mill.2          NA         NA         NA         NA         NA         NA
Mill.3          NA         NA         NA         NA         NA         NA
Mill.4         NaN        NaN        NaN        NaN        NaN        NaN
        p.value1 p.value2 p.value3
Hall.5        NA       NA       NA
Hall.6        NA       NA       NA
Hall.7        NA       NA       NA
Hall.8       NaN      NaN      NaN
Hall.9       NaN      NaN      NaN
Jopo.10       NA       NA       NA
Jopo.11       NA       NA       NA
Jopo.12       NA       NA       NA
Jopo.13      NaN      NaN      NaN
Mill.1        NA       NA       NA
Mill.2        NA       NA       NA
Mill.3        NA       NA       NA
Mill.4       NaN      NaN      NaN
> 

替代

另一种可能性是使用 w + 1 的宽度,然后删除最后一个分量.

Another possibility is to use a width of w+1 and then remove the last component.

# w <- 60    
w <- 3 
Coef1 <- function(x) coef(lm(as.data.frame(head(x, -1))))
do.call("rbind", by(DF, DF$Company, function(x) 
    cbind(x, rollapplyr(x[3:5], w+1, Coef1, fill = NA, by.column = FALSE))))

公司中少于w + 1行

如果有少于w + 1行的公司,请尝试此操作.它使用 rollapplyr partial = TRUE 参数来计算行数较少的 lm ,并相应地修改 Coef ,以便它将继续工作:

If there are companies with fewer than w+1 rows then try this. It uses the partial=TRUE argument of rollapplyr to compute lm with fewer rows and modifies Coef accordingly so that it will continue to work:

# w <- 60    
w <- 3
Coef <- function(x) coef(lm(as.data.frame(matrix(x, c(nrow(x), 1)))))
do.call("rbind", by(DF, DF$Company, function(x) cbind(x, 
  rollapplyr(x[3:5], list(-seq(w)), Coef, partial = TRUE, by.column = FALSE))))

注意:输入 DF 是:

Lines <- "Date          Company   Y     X1   X2
01.01.2015    Mill     0.13   -1    -3
01.02.2015    Mill     0.16   1    5 
01.03.2015    Mill     0.83   3    4
01.04.2015    Mill     -0.83  23   4
01.01.1988    Hall    0.23    1    3
01.02.1988    Hall    0.24    23   2
01.03.1988    Hall    0.78    19   -9
01.04.1988    Hall    0.73    4    12
01.05.1988    Hall    0.72    5    12
01.11.2008    Jopo    0.12    0.9  32
01.12.2008    Jopo    0.13    10   32
01.01.2009    Jopo    0.32    0.2  10
01.02.2009    Jopo    0.32    2    -1"
DF <- read.table(text = Lines, header = TRUE, as.is = TRUE)

这篇关于在移动窗口中仅具有先前值的线性回归的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆