并行化R中的滚动窗口回归 [英] Parallelize a rolling window regression in R

查看:479
本文介绍了并行化R中的滚动窗口回归的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在运行与以下代码非常相似的滚动回归:

I'm running a rolling regression very similar to the following code:

library(PerformanceAnalytics)
library(quantmod)
data(managers)

FL <- as.formula(Next(HAM1)~HAM1+HAM2+HAM3+HAM4)
MyRegression <- function(df,FL) {
  df <- as.data.frame(df)
  model <- lm(FL,data=df[1:30,])
  predict(model,newdata=df[31,])
}

system.time(Result <- rollapply(managers, 31, FUN="MyRegression",FL,
    by.column = FALSE, align = "right", na.pad = TRUE))

我有一些额外的处理器,因此我试图找到一种使滚动窗口并行化的方法.如果这是非滚动回归,那么我可以使用apply函数族轻松地对其进行并行化处理.

I've got some extra processors, so I'm trying to find a way to parallelize the rolling window. If this was a non-rolling regression I could easily parallelize it using the apply family of functions...

推荐答案

显而易见的方法是使用lm.fit()代替lm(),这样就不会在处理公式等方面产生所有开销.

The obvious one is to use lm.fit() instead of lm() so you don't incur all the overhead in processing the formula etc.

更新:所以当我说显而易见时,我的意思是显而易见,但难以实施

Update: So when I said obvious what I meant to say was blindingly obvious but deceptively difficult to implement!

经过一番摆弄之后,我想到了这个

After a bit of fiddling around, I came up with this

library(PerformanceAnalytics)
library(quantmod)
data(managers)

第一步是意识到可以预先构建模型矩阵,因此我们将其转换回Zoo对象以供rollapply()使用:

The first stage is to realise that the model matrix can be prebuilt, so we do that and convert it back to a Zoo object for use with rollapply():

mmat2 <- model.frame(Next(HAM1) ~ HAM1 + HAM2 + HAM3 + HAM4, data = managers, 
                     na.action = na.pass)
mmat2 <- cbind.data.frame(mmat2[,1], Intercept = 1, mmat2[,-1])
mmatZ <- as.zoo(mmat2)

现在,我们需要一个函数,该函数将使用lm.fit()进行繁重的工作,而不必在每次迭代时都创建设计矩阵:

Now we need a function that will employ lm.fit() to do the heavy lifting without having to create design matrices at each iteration:

MyRegression2 <- function(Z) {
    ## store value we want to predict for
    pred <- Z[31, -1, drop = FALSE]
    ## get rid of any rows with NA in training data
    Z <- Z[1:30, ][!rowSums(is.na(Z[1:30,])) > 0, ]
    ## Next() would lag and leave NA in row 30 for response
    ## but we precomputed model matrix, so drop last row still in Z
    Z <- Z[-nrow(Z),]
    ## fit the model
    fit <- lm.fit(Z[, -1, drop = FALSE], Z[,1])
    ## get things we need to predict, in case pivoting turned on in lm.fit
    p <- fit$rank
    p1 <- seq_len(p)
    piv <- fit$qr$pivot[p1]
    ## model coefficients
    beta <- fit$coefficients
    ## this gives the predicted value for row 31 of data passed in
    drop(pred[, piv, drop = FALSE] %*% beta[piv])
}

时间比较:

> system.time(Result <- rollapply(managers, 31, FUN="MyRegression",FL,
+                                 by.column = FALSE, align = "right", 
+                                 na.pad = TRUE))
   user  system elapsed 
  0.925   0.002   1.020 
> 
> system.time(Result2 <- rollapply(mmatZ, 31, FUN = MyRegression2,
+                                  by.column = FALSE,  align = "right",
+                                  na.pad = TRUE))
   user  system elapsed 
  0.048   0.000   0.05

与原始版本相比,可以提供相当合理的改进.现在检查结果对象是否相同:

Which affords a pretty reasonable improvement over the original. And now check that the resulting objects are the same:

> all.equal(Result, Result2)
[1] TRUE

享受!

这篇关于并行化R中的滚动窗口回归的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆