有没有一种快速的方法在 data.table 中运行滚动回归? [英] Is there a _fast_ way to run a rolling regression inside data.table?

查看:9
本文介绍了有没有一种快速的方法在 data.table 中运行滚动回归?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 R 中运行滚动回归,使用存储在 data.table 中的数据.

I am running rolling regressions in R, using with the data stored in a data.table.

我有一个工作版本,但是它感觉就像一个黑客——我真的在使用我从 zoo 包中知道的东西,而 没有任何魔法code>data.table ...因此,感觉比应该的要慢.

I have a working version, however it feels like a hack -- I am really using what i know from the zoo package, and none of the magic in data.table ... thus, it feels slower than it ought to be.

结合 Joshua 的建议 - 下面 - 使用 lm.fit 而不是 lm 可以将速度提高约 12 倍.

Incorporating Joshua's suggestion - below - there is a speedup of ~12x by using lm.fit rather than lm.

(修订)示例代码:

require(zoo)
require(data.table)
require(rbenchmark)
set.seed(1)

tt <- seq(as.Date("2011-01-01"), as.Date("2012-01-01"), by="day")
px <- rnorm(366, 95, 1)

DT <- data.table(period=tt, pvec=px)

dtt <- DT[,tnum:=as.numeric(period)][, list(pvec, tnum)]
dtx <- as.matrix(DT[,tnum:=as.numeric(period)][, tnum2:= tnum^2][, int:=1][, list(pvec, int, tnum, tnum2)])

rollreg <- function(dd) coef(lm(pvec ~ tnum + I(tnum^2), data=as.data.frame(dd)))
rollreg.fit <- function(dd) coef(lm.fit(y=dd[,1], x=dd[,-1]))

rr <- function(dd) rollapplyr(dd, width=20, FUN = rollreg, by.column=FALSE)
rr.fit <- function(dd) rollapplyr(dd, width=20, FUN = rollreg.fit, by.column=FALSE)

bmk <- benchmark(rr(dtt), rr.fit(dtx), 
         columns = c('test', 'elapsed', 'relative'),
         replications = 10,
         order = 'elapsed'
       )

     test elapsed relative
2 rr.fit(dtx)    0.48   1.0000
1     rr(dtt)    5.85  12.1875

尝试应用此处显示的知识这里,我制作了以下简单的滚动回归函数 我认为使用了一些 data.table 操作的速度.

Trying to apply the knowledge displayed here and here, I cooked up the following simple rolling regression function that I think uses some of the speed of data.table operations.

请注意,问题有点不同(而且更现实):取一个向量,添加滞后,然后回归自身.这类 AR 类型的问题非常广泛.

Note that the problem is a little different (and more realistic): take a vector, add lags, and regress on itself. This class of AR-type problems is pretty broad.

我在这里分享它,因为它可能有用,我确定它可以改进(我会随着改进而更新):

I am sharing it here as it may be of use, and i'm sure that it can be improved (i'll update as I improve):

require(data.table)
set.seed(1)
x  <- rnorm(1000)
DT <- data.table(x)
DTin <- data.table(x)

lagDT <- function(DTin, varname, l=5)
{
    i = 0
    while ( i < l){
        expr <- parse(text = 
                  paste0(varname, '_L', (i+1), 
                     ':= c(rep(NA, (1+i)),', varname, '[-((length(',     varname, ') - i):length(', varname, '))])'
                 )
              )
    DTin[, eval(expr)]
    i <- i + 1
}
return(DTin)
}   

rollRegDT <- function(DTin, varname, k=20, l=5)
{
adj <- k + l - 1
.x <- 1:(nrow(DTin)-adj)
DTin[, int:=1]
dtReg <- function(dd) coef(lm.fit(y=dd[-c(1:l),1], x=dd[-c(1:l),-1]))
eleNum <- nrow(DTin)*(l+1)
outMatx <- matrix(rep(NA, eleNum), ncol = (l+1))
colnames(outMatx) <- c('intercept', 'L1', 'L2', 'L3', 'L4', 'L5')
for (i in .x){
    dt_m <- as.matrix(lagDT(DTin[i:(i+adj), ], varname, l))
    outMatx[(i+(adj)),] <- dtReg(dt_m)
}
return(outMatx)
}

rollCoef <- rollRegDT(DT, varname='x')

推荐答案

据我所知没有;data.table 没有任何滚动窗口的特殊功能.其他包已经在向量上实现了滚动功能,因此它们可以在 data.tablej 中使用.如果它们的效率不够高,并且没有包有更快的版本(?),那么您可以自己编写更快的版本并(当然)贡献它们:要么添加到现有包中,要么创建自己的包.

Not as far as I know; data.table doesn't have any special features for rolling windows. Other packages already implement rolling functionality on vectors, so they can be used in the j of data.table. If they are not efficient enough, and no package has faster versions (?), then it's a case of writing faster versions yourself and (of course) contributing them: either to an existing package or creating your own.

相关问题(点击链接中的链接):

Related questions (follow links in links) :

使用data.table加速rollapply
R data.table 滑动窗口
R中多列的滚动回归

这篇关于有没有一种快速的方法在 data.table 中运行滚动回归?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆