missForest包中的并行处理 [英] parallelize process in missForest package

查看:313
本文介绍了missForest包中的并行处理的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用一个名为missForest的程序包来估计数据集中的缺失值. 我的问题是:我们如何并行化此过程以缩短获得结果所需的时间? 请参考以下示例(来自missForest包):

I am using a package called missForest to estimate the missing values in my data set. My question is: how can we parallelize this process to shorten the time that it takes to get the results? Please refer to this example (from missForest package):

 data(iris)
 summary(iris)

数据包含四个连续变量和一个类别变量. 使用prodNA函数人工产生缺失值:

The data contains four continuous and one categorical variable. Artificially produce missing values using the prodNA function:

set.seed(81)
iris.mis <- prodNA(iris, noNA = 0.2)
summary(iris.mis)

消除缺失值,以提供用于说明的完整矩阵.使用详细"查看两次迭代之间会发生什么:

Impute missing values providing the complete matrix for illustration. Use ’verbose’ to see what happens between iterations:

iris.imp <- missForest(iris.mis, xtrue = iris, verbose = TRUE)

推荐答案

昨天我向CRAN提交了missForest的1.4版. Windows和Linux软件包已准备就绪,Mac版本将很快推出.

Yesterday I submitted version 1.4 of missForest to CRAN; the Windows and Linux packages are ready, the Mac version will follow soon.

新功能具有一个附加参数"parallelize",该参数可以并行计算单个森林(parallelize ="forests")或同时计算多个变量上的多个森林(parallelize ="variables") ).默认设置为不进行并行计算(parallelize ="no").

The new function has an additional argument "parallelize" which allows to either compute the single forests in a parallel fashion (parallelize="forests") or to compute several forests on multiple variables at the same time (parallelize="variables"). The default setting is without parallel computing (parallelize="no").

不要忘记注册合适的并行后端,例如在首次尝试之前,请使用"doParallel"软件包. "doParallel"小插图在第4节中提供了一个示例.

Do not forget to register a suitable parallel backend, e.g. using the package "doParallel", before trying it for the first time. The "doParallel" vignette gives an illustrative example in Section 4.

由于其他一些细节,我不得不暂时从包装中取出"missForest"小插图.但我会在适当时候解决此问题并将其发布为1.4-1版.

Due to some other details I had to temporarily remove the "missForest" vignette from the package. But I will resolve this in due course and release it as version 1.4-1.

这篇关于missForest包中的并行处理的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆