在移动窗口中仅具有先前值的线性回归 [英] Linear regression with only previous values in moving window
问题描述
我有一个庞大的数据集,并希望在60的窗口内进行滚动线性回归.但是,我希望线性回归只考虑60个以前的值.
I have a huge dataset and would like to perform a rolling linear regression over a window of 60. However, I want that only the 60 previous values are considered for the linear regression.
我的数据框DF由以下列组成:
My Dataframe DF consists of following Columns:
Date Company Y X1 X2
01.01.2015 Mill 0.13 -1 -3
01.02.2015 Mill 0.16 1 5
01.03.2015 Mill 0.83 3 4
01.04.2015 Mill -0.83 23 4
01.01.1988 Hall 0.23 1 3
01.02.1988 Hall 0.24 23 2
01.03.1988 Hall 0.78 19 -9
01.04.1988 Hall 0.73 4 12
01.05.1988 Hall 0.72 5 12
01.11.2008 Jopo 0.12 0.9 32
01.12.2008 Jopo 0.13 10 32
01.01.2009 Jopo 0.32 0.2 10
01.02.2009 Jopo 0.32 2 -1
我有数千家公司,每个公司都有数月的数据.必须对公司的每个月进行回归,并使用该特定公司前60个月的滚动窗口.
I have several thousand companies and data for several months for each company. The regression has to be done for every month of a company, with the rolling window of 60 previous months of this specific company.
在给定的示例中,假设滚动窗口为3,我希望Mill公司在01.04.2015上使用01.01-01.03-2015的数据进行回归.对于Hall公司,我希望在01.04和01.05.1988上进行回归,对于Jopo,我希望在01.02.2009上进行回归.
In the given example, assuming only a rolling window of 3, I want for company Mill a regression on 01.04.2015 with the data from 01.01-01.03-2015. For company Hall I want regressions on 01.04 and 01.05.1988, and for Jopo I want a regression on 01.02.2009.
理想情况下,结果将与公司"和日期"一起粘贴到新的数据框中,因为我必须继续使用此数据并必须对其进行更多分析.
Ideally, the results will be pasted together with Company and Date in a new data frame, as I have to keep working with this data and have to analyse it more.
以下代码应该可以解决滚动回归问题,但是它不使用前60个日期,而是使用59个日期,并且还包括当前日期:
Following code should do the trick for the rolling regression, however it does not use the previous 60 dates, but 59 and includes the current date too:
图书馆(动物园)
rolled <- function(df) {
rollapply(df, width = 60,
FUN = function(z) coef(lm(Y ~ X1+X2, data = as.data.frame(z))),
by.column = FALSE, align = "right"
)
}
下面的代码根据公司名称进行回归,因为我想对每个单独的公司进行回归,从而独立于其他公司.
Following code does the regression dependent on the company name, as I want to make regressions for each individual company, independend from the other companies.
Test <- do.call("rbind", by(DF[c("Y", "X1", "X2")], DF[c( "Name")], rolled))
如何合并,仅将60个以前的值用于回归?也许有人知道如何在结果中同时显示公司"和日期"?感谢您的帮助!
How do I incorporate, that only the 60 previous values are used for the regression? And maybe someone knows how to show also "Company" and "Date" in the results? Thanks for your help!
推荐答案
假设 DF
如结尾处的注释中可重复地给出.使用 by
将 DF
拆分为公司行,并使用 rollapplyr
应用匿名功能.请注意, rollapplyr
可以为 width
使用一个列表参数,该列表参数包含要使用的位置的偏移量.例如, list(-seq(3))
表示使用前面的3行(如问题所建议),而不使用当前行(其位置为0).
Assume DF
is as given reproducibly in the Note at the end. Use by
to split DF
into company rows and apply the anonymous function using rollapplyr
. Note that rollapplyr
can take for the width
a list argument with the offsets of the positions to use. For example, list(-seq(3))
means use the 3 prior rows (as suggested in the question) but not the current row (which would have position 0).
library(zoo)
# w <- 60
w <- 3
Coef <- function(x) coef(lm(as.data.frame(x)))
do.call("rbind", by(DF, DF$Company, function(x)
cbind(x, rollapplyr(x[3:5], list(-seq(w)), Coef, fill = NA, by.column = FALSE))))
给予:
Date Company Y X1 X2 (Intercept) X1 X2
Hall.5 01.01.1988 Hall 0.23 1.0 3 NA NA NA
Hall.6 01.02.1988 Hall 0.24 23.0 2 NA NA NA
Hall.7 01.03.1988 Hall 0.78 19.0 -9 NA NA NA
Hall.8 01.04.1988 Hall 0.73 4.0 12 0.37711 -0.0017480 -0.0484553
Hall.9 01.05.1988 Hall 0.72 5.0 12 1.30333 -0.0433333 -0.0333333
Jopo.10 01.11.2008 Jopo 0.12 0.9 32 NA NA NA
Jopo.11 01.12.2008 Jopo 0.13 10.0 32 NA NA NA
Jopo.12 01.01.2009 Jopo 0.32 0.2 10 NA NA NA
Jopo.13 01.02.2009 Jopo 0.32 2.0 -1 0.41104 0.0010989 -0.0091259
Mill.1 01.01.2015 Mill 0.13 -1.0 -3 NA NA NA
Mill.2 01.02.2015 Mill 0.16 1.0 5 NA NA NA
Mill.3 01.03.2015 Mill 0.83 3.0 4 NA NA NA
Mill.4 01.04.2015 Mill -0.83 23.0 4 0.21611 0.2994444 -0.0711111
您也可以尝试以下方法:
You could also try this:
library(broom)
fun <- function(x) unlist(tidy(lm(as.data.frame(x)))[, -1])
do.call("rbind", by(DF, DF$Company, function(x)
cbind(x, rollapplyr(x[3:5], list(-(seq(w))), fun, fill = NA, by.column = FALSE))))
给出:
Date Company Y X1 X2 estimate1 estimate2 estimate3
Hall.5 01.01.1988 Hall 0.23 1.0 3 NA NA NA
Hall.6 01.02.1988 Hall 0.24 23.0 2 NA NA NA
Hall.7 01.03.1988 Hall 0.78 19.0 -9 NA NA NA
Hall.8 01.04.1988 Hall 0.73 4.0 12 0.3771138 -0.001747967 -0.048455285
Hall.9 01.05.1988 Hall 0.72 5.0 12 1.3033333 -0.043333333 -0.033333333
Jopo.10 01.11.2008 Jopo 0.12 0.9 32 NA NA NA
Jopo.11 01.12.2008 Jopo 0.13 10.0 32 NA NA NA
Jopo.12 01.01.2009 Jopo 0.32 0.2 10 NA NA NA
Jopo.13 01.02.2009 Jopo 0.32 2.0 -1 0.4110390 0.001098901 -0.009125874
Mill.1 01.01.2015 Mill 0.13 -1.0 -3 NA NA NA
Mill.2 01.02.2015 Mill 0.16 1.0 5 NA NA NA
Mill.3 01.03.2015 Mill 0.83 3.0 4 NA NA NA
Mill.4 01.04.2015 Mill -0.83 23.0 4 0.2161111 0.299444444 -0.071111111
std.error1 std.error2 std.error3 statistic1 statistic2 statistic3
Hall.5 NA NA NA NA NA NA
Hall.6 NA NA NA NA NA NA
Hall.7 NA NA NA NA NA NA
Hall.8 NaN NaN NaN NaN NaN NaN
Hall.9 NaN NaN NaN NaN NaN NaN
Jopo.10 NA NA NA NA NA NA
Jopo.11 NA NA NA NA NA NA
Jopo.12 NA NA NA NA NA NA
Jopo.13 NaN NaN NaN NaN NaN NaN
Mill.1 NA NA NA NA NA NA
Mill.2 NA NA NA NA NA NA
Mill.3 NA NA NA NA NA NA
Mill.4 NaN NaN NaN NaN NaN NaN
p.value1 p.value2 p.value3
Hall.5 NA NA NA
Hall.6 NA NA NA
Hall.7 NA NA NA
Hall.8 NaN NaN NaN
Hall.9 NaN NaN NaN
Jopo.10 NA NA NA
Jopo.11 NA NA NA
Jopo.12 NA NA NA
Jopo.13 NaN NaN NaN
Mill.1 NA NA NA
Mill.2 NA NA NA
Mill.3 NA NA NA
Mill.4 NaN NaN NaN
>
替代
另一种可能性是使用 w + 1
的宽度,然后删除最后一个分量.
Another possibility is to use a width of w+1
and then remove the last component.
# w <- 60
w <- 3
Coef1 <- function(x) coef(lm(as.data.frame(head(x, -1))))
do.call("rbind", by(DF, DF$Company, function(x)
cbind(x, rollapplyr(x[3:5], w+1, Coef1, fill = NA, by.column = FALSE))))
公司中少于w + 1行
如果有少于w + 1行的公司,请尝试此操作.它使用 rollapplyr
的 partial = TRUE
参数来计算行数较少的 lm
,并相应地修改 Coef
,以便它将继续工作:
If there are companies with fewer than w+1 rows then try this. It uses the partial=TRUE
argument of rollapplyr
to compute lm
with fewer rows and modifies Coef
accordingly so that it will continue to work:
# w <- 60
w <- 3
Coef <- function(x) coef(lm(as.data.frame(matrix(x, c(nrow(x), 1)))))
do.call("rbind", by(DF, DF$Company, function(x) cbind(x,
rollapplyr(x[3:5], list(-seq(w)), Coef, partial = TRUE, by.column = FALSE))))
注意:输入 DF
是:
Lines <- "Date Company Y X1 X2
01.01.2015 Mill 0.13 -1 -3
01.02.2015 Mill 0.16 1 5
01.03.2015 Mill 0.83 3 4
01.04.2015 Mill -0.83 23 4
01.01.1988 Hall 0.23 1 3
01.02.1988 Hall 0.24 23 2
01.03.1988 Hall 0.78 19 -9
01.04.1988 Hall 0.73 4 12
01.05.1988 Hall 0.72 5 12
01.11.2008 Jopo 0.12 0.9 32
01.12.2008 Jopo 0.13 10 32
01.01.2009 Jopo 0.32 0.2 10
01.02.2009 Jopo 0.32 2 -1"
DF <- read.table(text = Lines, header = TRUE, as.is = TRUE)
这篇关于在移动窗口中仅具有先前值的线性回归的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!