将R Parallel与其他R软件包一起使用 [英] Using R Parallel with other R packages

查看:139
本文介绍了将R Parallel与其他R软件包一起使用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用R中的LQMM包进行非常耗时的分析.我将模型设置为在周四开始运行,现在是星期一,并且仍在运行.我对模型本身(作为标准MLM测试)充满信心,对我的LQMM代码(对具有相同数据集的其他几个非常相似的LQMM也都运行了一天的代码)充满信心.但是我真的很想弄清楚如何使用我可以访问的计算机的并行处理功能使运行速度更快(请注意,所有这些都是基于Microsoft Windows的.)

I am working on a very time intensive analysis using the LQMM package in R. I set the model to start running on Thursday, it is now Monday, and is still running. I am confident in the model itself (tested as a standard MLM), and I am confident in my LQMM code (have run several other very similar LQMMs with the same dataset, and they all took over a day to run). But I'd really like to figure out how to make this run faster if possible using the parallel processing capabilities of the machines I have access to (note all are Microsoft Windows based).

我已经阅读了几本有关使用并行的教程,但是我还没有找到一个显示如何将并行软件包与其他R软件包一起使用的教程.

I have read through several tutorials on using parallel, but I have yet to find one that shows how to use the parallel package in concert with other R packages....am I over thinking this, or is it not possible?

这是我使用R软件包LQMM运行的代码:

Here is the code that I am running using the R package LQMM:

install.packages("lqmm")
library(lqmm)
g1.lqmm<-lqmm(y~x+IEP+pm+sd+IEPZ+IEP*x+IEP*pm+IEP*sd+IEP*IEPZ+x*pm+x*sd+x*IEPZ,random=~1+x+IEP+pm+sd+IEPZ, group=peers, tau=c(.1,.2,.3,.4,.5,.6,.7,.8,.9),na.action=na.omit,data=g1data)

数据集对58个变量具有122433个观测值.所有变量均采用z评分或伪编码.

The dataset has 122433 observations on 58 variables. All variables are z-scored or dummy coded.

推荐答案

需要在所有节点上对依赖库进行评估.为此,在parallel包中预见到功能clusterEvalQ.您可能还需要将一些数据导出到子节点的全局环境:为此,可以使用clusterExport函数.另请查看此页面以了解更多有关可能对您有用的其他相关功能的信息.

The dependent libraries will need to be evaluated on all your nodes. The function clusterEvalQ is foreseen inside the parallel package for this purpose. You might also need to export some of your data to the global environments of your subnodes: For this you can use the clusterExport function. Also view this page for more info on other relevant functions that might be useful to you.

通常,要通过使用多个内核来加快应用程序的运行速度,您必须将问题分解为多个子组件,这些子组件可以在不同的内核上并行处理.要在R中实现这一目标,您首先需要创建一个集群并为其分配特定数量的核心.接下来,您将必须注册集群,将所需的变量导出到节点,然后评估每个子节点上的必需库.设置群集和启动节点的确切方式取决于您将使用的子库和功能的类型.例如,当您选择使用doParallel包(以及大多数其他并行化子库/功能)时,您的clustersetup可能看起来像这样:

In general, to speed up your application by using multiple cores you will have to split up your problem in multiple subpieces that can be processed in parallel on different cores. To achieve this in R, you will first need to create a cluster and assign a particular number of cores to it. Next, You will have to register the cluster, export the required variables to the nodes and then evaluate the necessary libraries on each of your subnodes. The exact way that you will setup your cluster and launch the nodes will depend on the type of sublibraries and functions that you will use. As an example, your clustersetup might look like this when you choose to utilize the doParallel package (and most of the other parallelisation sublibraries/functions):

library(doParallel)
nrCores <- detectCores()
cl <- makeCluster(nrCores)
registerDoParallel(cl); 
clusterExport(cl,c("g1data"),envir=environment());
clusterEvalQ(cl,library("lqmm"))

集群现已准备就绪.现在,您可以将全局任务的子部分分配给集群中的每个节点.在下面的一般示例中,群集中的每个节点将处理全局任务的子部分i.在示例中,我们将使用doParallel软件包提供的foreach %dopar%功能:

The cluster is now prepared. You can now assign subparts of the global task to each individual node in your cluster. In the general example below each node in your cluster will process subpart i of the global task. In the example we will use the foreach %dopar% functionality that is provided by the doParallel package:

doParallel软件包为 使用R 2.14.0和 以后.

The doParallel package provides a parallel backend for the foreach/%dopar% function using the parallel package of R 2.14.0 and later.

子结果将自动添加到resultList.最后,当所有子流程完成后,我们合并结果:

Subresults will automatically be added to the resultList. Finally, when all subprocesses are finished we merge the results:

resultList <- foreach(i = 1:nrCores) %dopar%
{
   #process part i of your data.
}
stopCluster(cl)
#merge data..

由于您的问题不是专门针对如何拆分数据,所以我将让您自己了解这部分的详细信息.但是,您可以在doParallel包找到更详细的示例. >这篇文章.

Since your question was not specifically on how to split up your data I will let you figure out the details of this part for yourself. However, you can find a more detailed example using the doParallel package in my answer to this post.

这篇关于将R Parallel与其他R软件包一起使用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆