“累积”向量的向量化回归 [英] vectorization of "cumulative" regression

查看:117
本文介绍了“累积”向量的向量化回归的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有数据

 dat <- data.frame(t=1:100,y=rnorm(100),x1=rnorm(100)),x2=rnorm(100))

其中 t 给出时间点。我想对 x1 x2 上的 y 进行回归每个时间点都基于先前的时间点。

where t gives points in time. I would like to regress y on x1 and x2 at each point in time based on the preceeding points in time.

我可以创建一个循环

reg <- matrix(rep(NA,3*nrow(dat),ncol=3)
for(i in 11:nrow(dat)){
   reg[i,] <- coefficients(lm(y ~ x1 + x2, data=dat[1:i,]))
}

,但我想知道是否有人知道一种矢量化方法,也许使用 data.table

but I wonder whether anyone knows a way to vectorize this, perhaps using data.table.

推荐答案

我们可以根据需要使用non-equi-self-join来获取表:

We can use a non-equi-self-join to get the table as you like:

library(data.table)
setDT(dat)
# not clear if you wanted points _strictly_ before present, 
#   but the fix is basically clear -- just add nomatch = 0L to skip the first row
dat[dat, on = .(t <= t), allow.cartesian = TRUE]
        t           y         x1          x2
   1:   1 -0.51729096  0.1765509  1.06562278
   2:   2 -0.51729096  0.1765509  1.06562278
   3:   2  0.85173679 -0.7801053  0.05249113
   4:   3 -0.51729096  0.1765509  1.06562278
   5:   3  0.85173679 -0.7801053  0.05249113
  ---                                       
5046: 100  1.03802913 -2.7042756  2.05639758
5047: 100 -1.29122593  0.9013410  0.77088748
5048: 100  0.08262791  0.4135725  0.92694074
5049: 100 -0.93397320  0.2719790 -0.26097185
5050: 100 -1.23897617  0.9008160  0.61121185
             i.y       i.x1        i.x2
   1: -0.5172910  0.1765509  1.06562278
   2:  0.8517368 -0.7801053  0.05249113
   3:  0.8517368 -0.7801053  0.05249113
   4: -0.5080630 -2.0701757 -1.01573263
   5: -0.5080630 -2.0701757 -1.01573263
  ---                                  
5046: -1.2389762  0.9008160  0.61121185
5047: -1.2389762  0.9008160  0.61121185
5048: -1.2389762  0.9008160  0.61121185
5049: -1.2389762  0.9008160  0.61121185
5050: -1.2389762  0.9008160  0.61121185

(有点令人困惑,但在 t< = t ,LHS t 是指LHS dat ,RHS t 是指RHS dat

(a bit confusing, but in t <= t, the LHS t refers to the LHS dat, the RHS t refers to the RHS dat)

在这里,我们只需要按 t 并运行回归:

From here we need only group by t and run the regression:

dat[dat, on = .(t <= t), allow.cartesian = TRUE
    ][ , as.list(coef(lm(y ~ x1 + x2))), keyby = t
       # (only adding head here to limit output)
       ][ , head(.SD)]
#    t (Intercept)          x1          x2
# 1: 1  -0.5172910          NA          NA
# 2: 2  -0.2646369 -1.43105510          NA
# 3: 3   9.1879448  9.96212179 -10.7580819
# 4: 4  -0.3504059 -0.36654096   0.4523271
# 5: 5  -0.1681879 -0.06670494   0.3553107
# 6: 6   1.2108223  1.04082291  -0.6947567

这篇关于“累积”向量的向量化回归的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆