使用Data.table进行滚动回归-更新? [英] Rolling Regression with Data.table - Update?

查看:121
本文介绍了使用Data.table进行滚动回归-更新?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在data.table中运行滚动回归。我要尝试做的事情有很多问题,但是它们通常已经3岁以上了,并且提供的答案很微不足道。 (请参阅:此处,例如)

I am attempting to run a rolling regression within a data.table. There are a number of questions that get at what I am trying to do, but they are generally 3+ years old and offer inelegant answers. (see: here, for example)

我想知道是否对data.table软件包进行了任何更新,使其更直观/快点?

I am wondering if there has been any update to the data.table package that make this more intuitive/ faster?

这就是我想要做的。我的代码如下所示:

Here is what I am trying to do. My code looks like this:

DT<-data.table(
  Date = seq(as.Date("2000/1/1"), by = "day", length.out = 1000),
  x1=rnorm(1000),
  x2=rnorm(1000),
  x3=rnorm(1000),
  y=rnorm(1000),
  country=rep(c("a","b","c","d"), each=25))

我想在180天滚动的窗口中按国家/地区对x1,x2和x3上的y进行回归按日期的系数。

I would like to regress y on x1, x2 and x3, over a rolling 180 day window, by country, and store the coefficients by date.

理想情况下,语法应如下所示:

Ideally the syntax would look something like this:

DT[,.(coef.x1 := coef(y~x1+x2+x3)[2] , 
coef.x2 := coef(y~x1+x2+x3)[3], 
coef(y~x1+x2+x3)[4],
by=c("country",ROLLING WINDOW)]

...但更优雅/如果可能的话,避免重复!:)

... but even more elegant/ avoiding the repetition if possible! :)

我还没有获得rollapply语法能正常工作因为某种原因对我来说。

I have yet to get the rollapply syntax to work well for me for some reason.

谢谢!

编辑:

谢谢@michaelchirico。

Thank you @michaelchirico.

您的建议与我的目标很接近-也许可以修改代码以接收它,但我再次陷入困境。

Your suggestion comes close to what I'm aiming for - and maybe its possible to modify the code to receive it but again, I am stuck.

这里是我需要的更仔细的表述。某些代码:

Here is a more careful articulation of what I need. Some code:

DT<-data.table(
  Date = rep(seq(as.Date("2000/1/1"), by = "day", length.out = 10),times=3), #same dates per country

  x1=rep(rnorm(10),time=3), #x1's repeat - same per country
  x2=rep(rnorm(10), times=3),#x2's repeat - same per country
  x3=rep(rnorm(10), times=3), #x3's repeat - same per country
  y=rnorm(30), #y's do not repeat and are unique per country per day
  country=rep(c("a","b","c"), each=10))

#to calculate the coefficients by individual  country: 
a<-subset(DT,country=="a")
b<-subset(DT,country=="b")

window<-5 #declare window
coefs.a<-coef(lm(y~x1+x2+x3, data=a[1:window]))#initialize my coef variable
coefs.b<-coef(lm(y~x1+x2+x3, data=b[1:window]))#initialize my coef variable

##calculate coefficients per window

for(i in 1:(length(a$Date)-window)){
  coefs.a<-rbind(coefs.a, coef(lm(y~x1+x2+x3, data=a[(i+1):(i+window-1)])))
  coefs.b<-rbind(coefs.b, coef(lm(y~x1+x2+x3, data=b[(i+1):(i+window-1)])))
 }

此数据集与前一个数据集的区别是日期和x1,x2,x3都重复。我的国家/地区在每个国家/地区都不同。

The difference in this dataset versus the prior one is that the dates, and x1, x2, x3 all repeat. My y's are unique for each country.

在我的实际数据集中,我有120个国家/地区。我可以为每个国家/地区进行计算,但速度非常慢,然后必须将所有系数重新合并到一个数据集中以分析结果。

In my actual data set I have 120 countries. I can calculate this for each country, but it is awfully slow and then I have to rejoin all of the coefficients into a single dataset for analysis of the results.

有没有一种方法类似于您提出的以单个data.table结尾并包含所有观察结果的方法?

Is there a way similar to what you proposed to end up with a single data.table, with all observations?

再次感谢!

推荐答案

frollapply 仅接受数字矢量输入和输出,因此我们必须在行索引中使用 sapply()编写自己的代码。

frollapply only accepts numeric vector input and output, so we have to write our own with sapply() along the row indices.

window <- 180
DT[, 
   {
     data.table(t(sapply(seq_len(.N - window + 1),
                         function(k) lm(y ~ x1 + x2 + x3, 
                                        data = .SD[k:(k + window)])$coefficients)))
   }, 
   by = country] 
##      country (Intercept)         x1          x2          x3
##   1:       a  0.10163170 0.09561343 -0.11123725 -0.06489867
##   2:       a  0.11029460 0.08927926 -0.10657563 -0.06035072
##   3:       a  0.11328084 0.08856627 -0.10521865 -0.06278259
##   4:       a  0.12348242 0.07503412 -0.10483616 -0.06638923
##   5:       a  0.13285512 0.09268086 -0.11239769 -0.04068656
##  ---                                                       
## 280:       d  0.08249204 0.06252626 -0.06965884 -0.09680134
## 281:       d  0.07864977 0.05395658 -0.06137728 -0.10774067
## 282:       d  0.07937867 0.06996970 -0.07991358 -0.11377039
## 283:       d  0.07654691 0.06546692 -0.06824516 -0.10902969
## 284:       d  0.06123857 0.08590249 -0.05117317 -0.11728684
``

这篇关于使用Data.table进行滚动回归-更新?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆