“累积”向量的向量化回归 [英] vectorization of "cumulative" regression
本文介绍了“累积”向量的向量化回归的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有数据
dat <- data.frame(t=1:100,y=rnorm(100),x1=rnorm(100)),x2=rnorm(100))
其中 t
给出时间点。我想对 x1
和 x2
上的 y
进行回归每个时间点都基于先前的时间点。
where t
gives points in time. I would like to regress y
on x1
and x2
at each point in time based on the preceeding points in time.
我可以创建一个循环
reg <- matrix(rep(NA,3*nrow(dat),ncol=3)
for(i in 11:nrow(dat)){
reg[i,] <- coefficients(lm(y ~ x1 + x2, data=dat[1:i,]))
}
,但我想知道是否有人知道一种矢量化方法,也许使用 data.table
。
but I wonder whether anyone knows a way to vectorize this, perhaps using data.table
.
推荐答案
我们可以根据需要使用non-equi-self-join来获取表:
We can use a non-equi-self-join to get the table as you like:
library(data.table)
setDT(dat)
# not clear if you wanted points _strictly_ before present,
# but the fix is basically clear -- just add nomatch = 0L to skip the first row
dat[dat, on = .(t <= t), allow.cartesian = TRUE]
t y x1 x2
1: 1 -0.51729096 0.1765509 1.06562278
2: 2 -0.51729096 0.1765509 1.06562278
3: 2 0.85173679 -0.7801053 0.05249113
4: 3 -0.51729096 0.1765509 1.06562278
5: 3 0.85173679 -0.7801053 0.05249113
---
5046: 100 1.03802913 -2.7042756 2.05639758
5047: 100 -1.29122593 0.9013410 0.77088748
5048: 100 0.08262791 0.4135725 0.92694074
5049: 100 -0.93397320 0.2719790 -0.26097185
5050: 100 -1.23897617 0.9008160 0.61121185
i.y i.x1 i.x2
1: -0.5172910 0.1765509 1.06562278
2: 0.8517368 -0.7801053 0.05249113
3: 0.8517368 -0.7801053 0.05249113
4: -0.5080630 -2.0701757 -1.01573263
5: -0.5080630 -2.0701757 -1.01573263
---
5046: -1.2389762 0.9008160 0.61121185
5047: -1.2389762 0.9008160 0.61121185
5048: -1.2389762 0.9008160 0.61121185
5049: -1.2389762 0.9008160 0.61121185
5050: -1.2389762 0.9008160 0.61121185
(有点令人困惑,但在 t< = t
,LHS t
是指LHS dat
,RHS t
是指RHS dat
)
(a bit confusing, but in t <= t
, the LHS t
refers to the LHS dat
, the RHS t
refers to the RHS dat
)
在这里,我们只需要按 t
并运行回归:
From here we need only group by t
and run the regression:
dat[dat, on = .(t <= t), allow.cartesian = TRUE
][ , as.list(coef(lm(y ~ x1 + x2))), keyby = t
# (only adding head here to limit output)
][ , head(.SD)]
# t (Intercept) x1 x2
# 1: 1 -0.5172910 NA NA
# 2: 2 -0.2646369 -1.43105510 NA
# 3: 3 9.1879448 9.96212179 -10.7580819
# 4: 4 -0.3504059 -0.36654096 0.4523271
# 5: 5 -0.1681879 -0.06670494 0.3553107
# 6: 6 1.2108223 1.04082291 -0.6947567
这篇关于“累积”向量的向量化回归的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文