嵌套并行函数在R( [英] Nesting parallel functions in R (

查看:194
本文介绍了嵌套并行函数在R(的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我熟悉 foreach %dopar%等等。我也很熟悉 cv.glmnet parallel 选项。但是,如何设置如下嵌套的并行?

  library(glmnet)
library(foreach)
library(parallel)
library(doSNOW)
Npar < - 1000
Nobs < - 200
Xdat < - 矩阵(rnorm(Nobs * Npar),ncol = Npar)
Xclass < - rep(1:2,each = Nobs / 2)
Ydat < - rnorm(Nobs)

并行交叉验证:

  cl < -  makeCluster(8 (x = 1:2,.packages =glmnet)%dopar%{
idx< - Xclass == x
cv.glmnet(Xdat [idx,],Ydat [idx],nfolds = 4,parallel = TRUE)
})
stopCluster(cl)

不平行交叉验证:

<$ p $ (x = 1:2,p <= code> cl < - makeCluster(8,type =SOCK)
registerDoSNOW(c1)
system.time(mods < (xdat [idx,],Ydat [idx],nf)旧的= 4,parallel = FALSE)
})
stopCluster(cl)

对于两个系统时间,我只能得到一个非常小的差异。

是平行采取的是?或者我需要显式使用嵌套操作符?



副题:如果集群对象中有8个核心可用,并且 foreach 循环包含两个任务,每个任务将被赋予一个核心(而另外六个核心保持空闲),或者每个任务将被赋予四个核心(总共使用全部8个核心)?有什么方法可以查询在给定时间使用了多少个内核?

在您的并行交叉验证示例中,cv .glmnet本身不会并行运行,因为在集群员工中没有注册并行后端。外部foreach循环将平行运行,但不是cv.glmnet函数中的foreach循环。

要使用doSNOW作为外部和内部foreach循环,可以初始化使用clusterCall的积雪工作人员:

  cl < -  makeCluster(2,type =SOCK)
clusterCall (cl,function(){
library(doSNOW)
registerDoSNOW(makeCluster(2,type =SOCK))
NULL
})
registerDoSNOW(cl )

这会为主人和员工注册doSNOW,以便每次调用cv.glmnet在双工群集上,指定 parallel = TRUE

嵌套并行的技巧是避免创建太多的进程和超额订阅CPU(或CPU),所以在注册并行后端时需要小心。我的例子对于具有四个内核的CPU来说是合理的,即使总共创建了六个工作者,因为在内部foreach循环执行时,外部工作人员并没有太多的工作。在群集上运行以使用doSNOW为每个节点启动一个worker时,然后使用doMC在每个节点上为每个内核启动一个worker,这是很常见的。你的例子并没有使用太多的计算时间,所以使用两层并行性并不值得。我会用一个更大的问题来确定不同方法的好处。


I'm familiar with foreach, %dopar% and the like. I am also familiar with the parallel option for cv.glmnet. But how do you set up the nested parallelistion as below?

library(glmnet)
library(foreach)
library(parallel)
library(doSNOW)
Npar <- 1000
Nobs <- 200
Xdat <- matrix(rnorm(Nobs * Npar), ncol = Npar)
Xclass <- rep(1:2, each = Nobs/2)
Ydat <- rnorm(Nobs)

Parallel cross-validation:

cl <- makeCluster(8, type = "SOCK")
registerDoSNOW(cl)
system.time(mods <- foreach(x = 1:2, .packages = "glmnet") %dopar% {
    idx <- Xclass == x
    cv.glmnet(Xdat[idx,], Ydat[idx], nfolds = 4, parallel = TRUE)
})
stopCluster(cl)

Not parallel cross-validation:

cl <- makeCluster(8, type = "SOCK")
registerDoSNOW(cl)
system.time(mods <- foreach(x = 1:2, .packages = "glmnet") %dopar% {
    idx <- Xclass == x
    cv.glmnet(Xdat[idx,], Ydat[idx], nfolds = 4, parallel = FALSE)
})
stopCluster(cl)

For the two system times I am only getting a very marginal difference.

Is parallelistion taken are of? Or do I need to use the nested operator explicitly?

Side-question: If 8 cores are available in a cluster object and the foreach loop contains two tasks, will each task be given 1 core (and the other 6 cores left idle) or will each task be given four cores (using up all 8 cores in total)? What's the way to query how many cores are being used at a given time?

解决方案

In your parallel cross-validation example, cv.glmnet itself will not run in parallel because there is no foreach parallel backend registered in the cluster workers. The outer foreach loop will run in parallel, but not the foreach loop in the cv.glmnet function.

To use doSNOW for the outer and inner foreach loops, you could initialize the snow cluster workers using clusterCall:

cl <- makeCluster(2, type = "SOCK")
clusterCall(cl, function() {
  library(doSNOW)
  registerDoSNOW(makeCluster(2, type = "SOCK"))
  NULL
})
registerDoSNOW(cl)

This registers doSNOW for both the master and the workers so that each call to cv.glmnet will execute on a two-worker cluster when parallel=TRUE is specified.

The trick with nested parallelism is to avoid creating too many processes and oversubscribing the CPU (or CPUs), so you need to be careful when registering the parallel backends. My example makes sense for a CPU with four cores even though a total of six workers are created, since the "outer" workers don't do much while the inner foreach loops execute. It is common when running on a cluster to use doSNOW to start one worker per node, and then use doMC to start one worker per core on each of those nodes.

Note that your example doesn't use much compute time, so it's not really worthwhile to use two levels of parallelism. I would use a much bigger problem in order to determine the benefits of the different approaches.

这篇关于嵌套并行函数在R(的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆