使用鼠标R包并行计算多重插补 [英] Parallel computation of multiple imputation by using mice R package

查看：91 发布时间：2020/5/24 20:54:25 r parallel-processing multiple-mice r-mice

本文介绍了使用鼠标R包并行计算多重插补的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想通过使用R中的mice运行150个多重插补.但是，为了节省一些计算时间，我会撒谎将进程细分为并行流(如Stef van Buuren在丢失数据的灵活插补"中所建议的那样).

I want to run 150 multiple imputations by using mice in R. However, in order to save some computing time, I would lie to subdivide the process in parallel streams (as suggested by Stef van Buuren in "Flexible Imputation for Missing Data").

我的问题是:该怎么做?

My question is: how to do that?

我可以想象2种选择:

opt.1:

imp1<-mice(data, m=1, pred=quicktry, maxit=15, seed=1)
imp2<-mice(data, m=1, pred=quicktry, maxit=15, seed=1)
imp...<-mice(data, m=1, pred=quicktry, maxit=15, seed=1)
imp150<-mice(data, m=1, pred=quicktry, maxit=15, seed=1)

，然后通过使用complete和as.mids将归因组合在一起

and then combine the imputations together by using complete and as.mids afterwards

opt.2:

imp1<-mice(data, m=1, pred=quicktry, maxit=15, seed=VAL_1to150)
imp2<-mice(data, m=1, pred=quicktry, maxit=15, seed=VAL_1to150)
imp...<-mice(data, m=1, pred=quicktry, maxit=15, seed=VAL_1to150)
imp150<-mice(data, m=1, pred=quicktry, maxit=15, seed=VAL_1to150)

通过添加VAL_1to150来

否则，在我看来(我可能错了)，如果它们都使用相同的数据集和相同的种子运行，您将获得150倍的相同结果.

by adding VAL_1to150 otherwise it seems to me (I may be wrong) that if they all run with the same dataset and the same seed you will have 150 times the same result.

还有其他选择吗?

谢谢

将`foreach`与`ibind`一起使用

也许最简单的选择是使用foreach:

Using `foreach` with `ibind`

The perhaps the simplest alternative is to use foreach:

library(foreach)
library(doParallel)
cl <- makeCluster(cores_2_use)
clusterSetRNGStream(cl, 9956)
registerDoParallel(cl)

library(mice)
imp_merged <-
  foreach(no = 1:cores_2_use, 
          .combine = ibind, 
          .export = "nhanes",
          .packages = "mice") %dopar%
{
  mice(nhanes, m = 30, printFlag = FALSE)
}
stopCluster(cl)

使用`complete`

使用complete(..., action="long")提取完整的数据集，rbind-将其提取，然后使用as.mids其他mice对象可能会很好，但是它比其他两种方法生成的对象更苗条:

Using `complete`

Extracting the full datasets using complete(..., action="long"), rbind-ing these and then using as.mids other mice objects may work well but it generates a slimmer object than what the other two approaches:

merged_df <- nhanes
merged_df <- 
  cbind(data.frame(.imp = 0,
                   .id = 1:nrow(nhanes)),
        merged_df)
for (n in 1:length(imp_pars)){
  tmp <- complete(imp_pars[[n]], action = "long")
  tmp$.imp <- as.numeric(tmp$.imp) + max(merged_df$.imp)
  merged_df <- 
    rbind(merged_df,
          tmp)
}

imp_merged <- 
  as.mids(merged_df)

# Compare the most important the est and se for easier comparison
cbind(summary(pool(with(data=imp_merged,
                        exp=lm(bmi~age+hyp+chl))))[,c("est", "se")],
      summary(pool(with(data=mice(nhanes, 
                                  m = 60, 
                                  printFlag = FALSE),
                        exp=lm(bmi~age+hyp+chl))))[,c("est", "se")])

给出输出:

                    est         se         est         se
(Intercept) 20.41921496 3.85943925 20.33952967 3.79002725
age         -3.56928102 1.35801557 -3.65568620 1.27603817
hyp          1.63952970 2.05618895  1.60216683 2.17650536
chl          0.05396451 0.02278867  0.05525561 0.02087995

保持正确的中音对象

我下面的替代方法展示了如何合并插补对象并保留mids对象后面的全部功能.自ibind解决方案以来，我将其留给有兴趣探索如何合并复杂列表的人.

Keeping a correct mids-object

My alternative approach below shows how to merge imputation objects and retain the full functionality behind the mids object. Since the ibind solution I've left this in for anyone interested in exploring how to merge complex lists.

我研究了mice的mids对象，您必须采取一些步骤，才能在并行运行后获得至少一个相似的mids对象.如果我们检查mids-object并将两个具有不同设置的对象进行比较，则会得到:

I've looked into mice's mids-object and there are a few step that you have to take in order to get at least a similar mids-object after running in parallel. If we examine the mids-object and compare two objects with two different setups we get:

library(mice)
imp <- list()
imp <- c(imp,
         list(mice(nhanes, m = 40)))
imp <- c(imp,
         list(mice(nhanes, m = 20)))

sapply(names(imp[[1]]),
       function(n)
         try(all(useful::compare.list(imp[[1]][[n]], 
                                      imp[[2]][[n]]))))

您可以在其中看到两次运行之间的调用，m，imp，chainMean和chainVar有所不同.在这些因素中，imp无疑是最重要的，但似乎也应该更新其他组件.因此，我们将从构建鼠标合并功能开始:

Where you can see that the call, m, imp, chainMean, and chainVar differ between the two runs. Out of these the imp is without doubt the most important but it seems like a wise option to update the other components as well. We will therefore start by building a mice merger function:

mergeMice <- function (imp) {
  merged_imp <- NULL
  for (n in 1:length(imp)){
    if (is.null(merged_imp)){
      merged_imp <- imp[[n]]
    }else{
      counter <- merged_imp$m
      # Update counter
      merged_imp$m <- 
        merged_imp$m + imp[[n]]$m
      # Rename chains
      dimnames(imp[[n]]$chainMean)[[3]] <-
        sprintf("Chain %d", (counter + 1):merged_imp$m)
      dimnames(imp[[n]]$chainVar)[[3]] <-
        sprintf("Chain %d", (counter + 1):merged_imp$m)
      # Merge chains
      merged_imp$chainMean <- 
        abind::abind(merged_imp$chainMean, 
                     imp[[n]]$chainMean)
      merged_imp$chainVar <- 
        abind::abind(merged_imp$chainVar, 
                     imp[[n]]$chainVar)
      for (nn in names(merged_imp$imp)){
        # Non-imputed variables are not in the
        # data.frame format but are null
        if (!is.null(imp[[n]]$imp[[nn]])){
          colnames(imp[[n]]$imp[[nn]]) <- 
            (counter + 1):merged_imp$m
          merged_imp$imp[[nn]] <- 
            cbind(merged_imp$imp[[nn]],
                  imp[[n]]$imp[[nn]])
        }
      }
    }
  }
  # TODO: The function should update the $call parameter
  return(merged_imp)
}

我们现在可以通过以下方式简单地合并上面生成的两个插补:

We can now simply merge the two above generated imputations through:

merged_imp <- mergeMice(imp)
merged_imp_pars <- mergeMice(imp_pars)

现在看来我们得到了正确的输出:

Now it seems that we get the right output:

# Compare the three alternatives
cbind(
  summary(pool(with(data=merged_imp,
                    exp=lm(bmi~age+hyp+chl))))[,c("est", "se")],
 summary(pool(with(data=merged_imp_pars,
                    exp=lm(bmi~age+hyp+chl))))[,c("est", "se")],
 summary(pool(with(data=mice(nhanes, 
                             m = merged_imp$m, 
                             printFlag = FALSE),
                   exp=lm(bmi~age+hyp+chl))))[,c("est", "se")])

赠予:

                    est         se         est        se
(Intercept) 20.16057550 3.74819873 20.31814393 3.7346252
age         -3.67906629 1.19873118 -3.64395716 1.1476377
hyp          1.72637216 2.01171565  1.71063127 1.9936347
chl          0.05590999 0.02350609  0.05476829 0.0213819
                    est         se
(Intercept) 20.14271905 3.60702992
age         -3.78345532 1.21550474
hyp          1.77361005 2.11415290
chl          0.05648672 0.02046868

好，就是这样.玩得开心.

Ok, that's it. Have fun.

这篇关于使用鼠标R包并行计算多重插补的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用鼠标R包并行计算多重插补 [英] Parallel computation of multiple imputation by using mice R package

问题描述

推荐答案

将`foreach`与`ibind`一起使用

Using `foreach` with `ibind`

使用`complete`

Using `complete`

保持正确的中音对象

Keeping a correct mids-object

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

使用鼠标R包并行计算多重插补 [英] Parallel computation of multiple imputation by using mice R package

问题描述

推荐答案

将foreach与ibind一起使用

Using foreach with ibind

使用complete

Using complete

保持正确的中音对象

Keeping a correct mids-object

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

将`foreach`与`ibind`一起使用

Using `foreach` with `ibind`

使用`complete`

Using `complete`

登录关闭