有没有人试图并行化“小鼠"包装中的多个插补? [英] Has anyone tried to parallelize multiple imputation in 'mice' package?

查看:74
本文介绍了有没有人试图并行化“小鼠"包装中的多个插补?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道Amelia R软件包为并行 多重插补(MI)提供了一些支持.但是,对我的研究数据的初步分析显示,该数据不是不是 多元正态变量,因此,很遗憾,我不能使用Amelia.因此,我已将mice R包用于MI,因为此包可以对非多元正态数据执行MI.

I'm aware of the fact that Amelia R package provides some support for parallel multiple imputation (MI). However, preliminary analysis of my study's data revealed that the data is not multivariate normal, so, unfortunately, I can't use Amelia. Consequently, I've switched to using mice R package for MI, as this package can perform MI on data that is not multivariate normal.

由于通过mice进行的MI进程非常缓慢(非常慢)(当前我使用的是AWS m3.large 2核实例),所以我开始怀疑是否可以并行化该过程以节省处理时间.根据我对mice文档和相应的JSS论文的回顾,以及mice的源代码,看来当前该程序包不支持并行操作.令人遗憾的是,因为恕我直言,MICE算法是自然并行的,因此,它的并行实现应该相对容易,并且可以节省大量的时间和资源.

Since the MI process via mice is very slow (currently I'm using AWS m3.large 2-core instance), I've started wondering whether it's possible to parallelize the procedure to save processing time. Based on my review of mice documentation and the corresponding JSS paper, as well as mice's source code, it appears that currently the package doesn't support parallel operations. This is sad, because IMHO the MICE algorithm is naturally parallel and, thus, its parallel implementation should be relatively easy and it would result in a significant economy in both time and resources.

问题:是否有人尝试在外部(通过R并行工具)或在内部(通过修改源代码)并行化mice程序包中的MI,如果有的话,结果是什么任何?谢谢!

Question: Has anyone tried to parallelize MI in mice package, either externally (via R parallel facilities), or internally (by modifying the source code) and what are results, if any? Thank you!

推荐答案

最近,我尝试通过mice包在外部(即通过使用R多处理设施,特别是parallel软件包,它是R基本发行版的标准配置.基本上,解决方案是使用mclapply()函数分配所需MI迭代总数中的预先计算的份额,然后将生成的估算数据组合到单个对象中. 性能方面,这种方法的结果超出了我最乐观的预期:处理时间从1.5小时减少到不到7分钟(! ).只有两个核心.我删除了一个多级因素,但是它不会产生太大影响.无论如何,结果令人难以置信!

Recently, I've tried to parallelize multiple imputation (MI) via mice package externally, that is, by using R multiprocessing facilities, in particular parallel package, which comes standard with R base distribution. Basically, the solution is to use mclapply() function to distribute a pre-calculated share of the total number of needed MI iterations and then combine resulting imputed data into a single object. Performance-wise, the results of this approach are beyond my most optimistic expectations: the processing time decreased from 1.5 hours to under 7 minutes(!). That's only on two cores. I've removed one multilevel factor, but it shouldn't have much effect. Regardless, the result is unbelievable!

这篇关于有没有人试图并行化“小鼠"包装中的多个插补?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆