使用mice R包并行计算多重插补 [英] Parallel computation of multiple imputation by using mice R package

查看：22 发布时间：2021/12/30 21:44:16 r parallel-processing r-mice

本文介绍了使用mice R包并行计算多重插补的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想通过在 R 中使用 mice 来运行 150 个多重插补.但是，为了节省一些计算时间，我会撒谎将过程细分为并行流(如 Stef van Buuren 在对缺失数据的灵活插补"中所建议的那样).

I want to run 150 multiple imputations by using mice in R. However, in order to save some computing time, I would lie to subdivide the process in parallel streams (as suggested by Stef van Buuren in "Flexible Imputation for Missing Data").

我的问题是:怎么做?

我可以想象两个选项:

选项 1:

imp1<-mice(data, m=1, pred=quicktry, maxit=15, seed=1)
imp2<-mice(data, m=1, pred=quicktry, maxit=15, seed=1)
imp...<-mice(data, m=1, pred=quicktry, maxit=15, seed=1)
imp150<-mice(data, m=1, pred=quicktry, maxit=15, seed=1)

然后使用 complete 和 as.mids 将插补组合在一起

and then combine the imputations together by using complete and as.mids afterwards

选项 2:

imp1<-mice(data, m=1, pred=quicktry, maxit=15, seed=VAL_1to150)
imp2<-mice(data, m=1, pred=quicktry, maxit=15, seed=VAL_1to150)
imp...<-mice(data, m=1, pred=quicktry, maxit=15, seed=VAL_1to150)
imp150<-mice(data, m=1, pred=quicktry, maxit=15, seed=VAL_1to150)

通过添加 VAL_1to150 否则在我看来(我可能是错的)如果它们都使用相同的数据集和相同的种子运行，你将得到 150 倍相同的结果.

by adding VAL_1to150 otherwise it seems to me (I may be wrong) that if they all run with the same dataset and the same seed you will have 150 times the same result.

还有其他选择吗?

谢谢

使用 `foreach` 和 `ibind`

也许最简单的替代方法是使用 foreach:

library(foreach)
library(doParallel)
cl <- makeCluster(cores_2_use)
clusterSetRNGStream(cl, 9956)
registerDoParallel(cl)

library(mice)
imp_merged <-
  foreach(no = 1:cores_2_use, 
          .combine = ibind, 
          .export = "nhanes",
          .packages = "mice") %dopar%
{
  mice(nhanes, m = 30, printFlag = FALSE)
}
stopCluster(cl)

使用`完整`

使用 complete(..., action="long")、rbind 提取完整数据集，然后使用 as.midscode> 其他 mice 对象可能工作得很好，但它生成的对象比其他两种方法更薄:

Using `complete`

Extracting the full datasets using complete(..., action="long"), rbind-ing these and then using as.mids other mice objects may work well but it generates a slimmer object than what the other two approaches:

merged_df <- nhanes
merged_df <- 
  cbind(data.frame(.imp = 0,
                   .id = 1:nrow(nhanes)),
        merged_df)
for (n in 1:length(imp_pars)){
  tmp <- complete(imp_pars[[n]], action = "long")
  tmp$.imp <- as.numeric(tmp$.imp) + max(merged_df$.imp)
  merged_df <- 
    rbind(merged_df,
          tmp)
}

imp_merged <- 
  as.mids(merged_df)

# Compare the most important the est and se for easier comparison
cbind(summary(pool(with(data=imp_merged,
                        exp=lm(bmi~age+hyp+chl))))[,c("est", "se")],
      summary(pool(with(data=mice(nhanes, 
                                  m = 60, 
                                  printFlag = FALSE),
                        exp=lm(bmi~age+hyp+chl))))[,c("est", "se")])

给出输出:

                    est         se         est         se
(Intercept) 20.41921496 3.85943925 20.33952967 3.79002725
age         -3.56928102 1.35801557 -3.65568620 1.27603817
hyp          1.63952970 2.05618895  1.60216683 2.17650536
chl          0.05396451 0.02278867  0.05525561 0.02087995

保持正确的中间对象

下面我的替代方法展示了如何合并插补对象并保留 mids 对象背后的全部功能.自从 ibind 解决方案以来，我将其留给任何有兴趣探索如何合并复杂列表的人.

Keeping a correct mids-object

My alternative approach below shows how to merge imputation objects and retain the full functionality behind the mids object. Since the ibind solution I've left this in for anyone interested in exploring how to merge complex lists.

我已经研究了 mice 的 mids-object，为了在并行运行后至少获得一个类似的 mids-object，您必须采取一些步骤.如果我们检查 mids 对象并比较具有两种不同设置的两个对象，我们会得到:

I've looked into mice's mids-object and there are a few step that you have to take in order to get at least a similar mids-object after running in parallel. If we examine the mids-object and compare two objects with two different setups we get:

library(mice)
imp <- list()
imp <- c(imp,
         list(mice(nhanes, m = 40)))
imp <- c(imp,
         list(mice(nhanes, m = 20)))

sapply(names(imp[[1]]),
       function(n)
         try(all(useful::compare.list(imp[[1]][[n]], 
                                      imp[[2]][[n]]))))

在这里您可以看到两次运行之间的调用、m、imp、chainMean 和 chainVar 不同.其中，imp 无疑是最重要的，但更新其他组件似乎也是一个明智的选择.因此，我们将从构建小鼠合并函数开始:

Where you can see that the call, m, imp, chainMean, and chainVar differ between the two runs. Out of these the imp is without doubt the most important but it seems like a wise option to update the other components as well. We will therefore start by building a mice merger function:

mergeMice <- function (imp) {
  merged_imp <- NULL
  for (n in 1:length(imp)){
    if (is.null(merged_imp)){
      merged_imp <- imp[[n]]
    }else{
      counter <- merged_imp$m
      # Update counter
      merged_imp$m <- 
        merged_imp$m + imp[[n]]$m
      # Rename chains
      dimnames(imp[[n]]$chainMean)[[3]] <-
        sprintf("Chain %d", (counter + 1):merged_imp$m)
      dimnames(imp[[n]]$chainVar)[[3]] <-
        sprintf("Chain %d", (counter + 1):merged_imp$m)
      # Merge chains
      merged_imp$chainMean <- 
        abind::abind(merged_imp$chainMean, 
                     imp[[n]]$chainMean)
      merged_imp$chainVar <- 
        abind::abind(merged_imp$chainVar, 
                     imp[[n]]$chainVar)
      for (nn in names(merged_imp$imp)){
        # Non-imputed variables are not in the
        # data.frame format but are null
        if (!is.null(imp[[n]]$imp[[nn]])){
          colnames(imp[[n]]$imp[[nn]]) <- 
            (counter + 1):merged_imp$m
          merged_imp$imp[[nn]] <- 
            cbind(merged_imp$imp[[nn]],
                  imp[[n]]$imp[[nn]])
        }
      }
    }
  }
  # TODO: The function should update the $call parameter
  return(merged_imp)
}

我们现在可以通过以下方式简单地合并上面生成的两个插补:

We can now simply merge the two above generated imputations through:

merged_imp <- mergeMice(imp)
merged_imp_pars <- mergeMice(imp_pars)

现在看来我们得到了正确的输出:

Now it seems that we get the right output:

# Compare the three alternatives
cbind(
  summary(pool(with(data=merged_imp,
                    exp=lm(bmi~age+hyp+chl))))[,c("est", "se")],
 summary(pool(with(data=merged_imp_pars,
                    exp=lm(bmi~age+hyp+chl))))[,c("est", "se")],
 summary(pool(with(data=mice(nhanes, 
                             m = merged_imp$m, 
                             printFlag = FALSE),
                   exp=lm(bmi~age+hyp+chl))))[,c("est", "se")])

给出:

                    est         se         est        se
(Intercept) 20.16057550 3.74819873 20.31814393 3.7346252
age         -3.67906629 1.19873118 -3.64395716 1.1476377
hyp          1.72637216 2.01171565  1.71063127 1.9936347
chl          0.05590999 0.02350609  0.05476829 0.0213819
                    est         se
(Intercept) 20.14271905 3.60702992
age         -3.78345532 1.21550474
hyp          1.77361005 2.11415290
chl          0.05648672 0.02046868

好的，就是这样.玩得开心.

Ok, that's it. Have fun.

这篇关于使用mice R包并行计算多重插补的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用mice R包并行计算多重插补 [英] Parallel computation of multiple imputation by using mice R package

问题描述

推荐答案

使用 `foreach` 和 `ibind`

使用`完整`

Using `complete`

保持正确的中间对象

Keeping a correct mids-object

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

使用mice R包并行计算多重插补 [英] Parallel computation of multiple imputation by using mice R package

问题描述

推荐答案

使用 foreach 和 ibind

使用完整

Using complete

保持正确的中间对象

Keeping a correct mids-object

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

使用 `foreach` 和 `ibind`

使用`完整`

Using `complete`

登录关闭