使用foreach并行化的问题 [英] Problems using foreach parallelization

查看:172
本文介绍了使用foreach并行化的问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图比较并行化选项。具体而言,我将标准 SNOW mulitcore 实现与使用 doSNOW doMC foreach 。作为样本问题,我通过多次计算从标准正态分布中抽取样本的方法来说明中心极限定理。这里是标准的代码:

pre $ Cl $ Sim < - 函数(nSims = 1000,size = 100,mu = 0,sigma = 1 ){
sapply(1:nSims,function(x){
mean(rnorm(n = size,mean = mu,sd = sigma))
})
}

下面是 SNOW 的实现:


$ b $ pre $ library $($)
cl< - makeCluster(2)

ParCltSim< - function cluster,nSims = 1000,size = 100,mu = 0,sigma = 1){
parSapply(cluster,1:nSims,function(x){
mean(rnorm(n = size,mean =亩,sd = sigma))
})
}

doSNOW 方法:

  library(foreach)
library (nSims = 1000,size = 100,mu = 0,sigma = 1){
x< - numeric( nSims)
foreach(i = 1:nSims,.combine = cbind)%dopar%{
x [i] < - 均值(rnorm(n =大小,平均值= mu,sd = sigma))


$ / code $ / pre
$ b $ p我得到了f结果如下:


 > system.time(CltSim(nSims = 10000,size = 100))
用户系统已经过了
0.476 0.008 0.484
> system.time(ParCltSim(cluster = cl,nSims = 10000,size = 100))
用户系统经过
0.028 0.004 0.375
> system.time(FECltSim(nSims = 10000,size = 100))
用户系统已用
8.865 0.408 11.309



SNOW 实现相对于非并行运行减少了大约23%的计算时间(随着模拟次数的增加,时间节省越来越大,我们期望)。 foreach 尝试实际上运行时间增加了20倍。另外,如果我更改%dopar%%do%并检查循环的非并行版本,它需要7秒以上。



另外,我们可以考虑多核包。为多核编写的模拟是

$ p $ 库(多核)
MCCltSim < - 函数(nSims = 1000,size = 100,mu = 0,sigma = 1){
unlist(mclapply(1:nSims,function(x){
mean(rnorm =大小,平均=亩,SD =西格玛))
)))
}

我们得到了比 SNOW 更好的速度提升:

  > system.time(MCCltSim(nSims = 10000,size = 100))
用户系统已用完
0.924 0.032 0.307

开始一个新的R会话,我们可以使用 doMC 来尝试执行 foreach ,而不是 doSNOW ,调用

 库(doMC)
registerDoMC )

然后运行 FECltSim() ,仍然发现

 > system.time(FECltSim(nSims = 10000,size = 100))
用户系统已用完
6.800 0.024 6.887

与非并行化运行时相比,这仅增加了14倍。
$ b

结论:我的 foreach在 doSNOW doMC 下,代码没有有效运行。任何想法为什么?

谢谢,
查理

解决方案

  FECltSim < -  function(nSims = 1000,大小= 100,mu = 0,sigma = 1){
foreach(i = 1:nSims,.combine = c)%dopar%{
mean(rnorm(n = size,mean = mu, sd = sigma))
}
}

没有必要明确地在循环内。也不需要使用cbind,因为你的结果是每次只有一个数字。所以 .combine = c 会做

foreach的事情是它会产生相当多的开销在核心之间得到不同核心结合在一起的结果。快速浏览个人资料显示了这一点:

  $ by.self 
self.time self.pct total .time total.pct
$ 5.46 41.30 5.46 41.30
$ < - 0.76 5.75 0.76 5.75
。呼叫0.76 5.75 0.76 5.75
...

超过40%的时间忙于选择。它还为整个操作使用了很多其他功能。实际上,如果你通过非常耗时的函数进行相对较少的回合,那么只建议 foreach 。另外两个解决方案是建立在不同的技术基础上的,在R方面做得少得多。在一个侧面节点上, snow >实际上最初是为了在群集上工作,而不是在单个工作站上工作,比如 multicore 是。


I'm trying to compare parallelization options. Specifically, I'm comparing the standard SNOW and mulitcore implementations to those using doSNOW or doMC and foreach. As a sample problem, I'm illustrating the central limit theorem by computing the means of samples drawn from a standard normal distribution many times. Here's the standard code:

CltSim <- function(nSims=1000, size=100, mu=0, sigma=1){
  sapply(1:nSims, function(x){
    mean(rnorm(n=size, mean=mu, sd=sigma))
  })
}

Here's the SNOW implementation:

library(snow)
cl <- makeCluster(2)

ParCltSim <- function(cluster, nSims=1000, size=100, mu=0, sigma=1){
  parSapply(cluster, 1:nSims, function(x){
    mean(rnorm(n=size, mean=mu, sd=sigma))
  })
}

Next, the doSNOW method:

library(foreach)
library(doSNOW)
registerDoSNOW(cl)

FECltSim <- function(nSims=1000, size=100, mu=0, sigma=1) {
  x <- numeric(nSims)
  foreach(i=1:nSims, .combine=cbind) %dopar% {
    x[i] <- mean(rnorm(n=size, mean=mu, sd=sigma))
  }
}

I get the following results:

> system.time(CltSim(nSims=10000, size=100))
   user  system elapsed 
  0.476   0.008   0.484 
> system.time(ParCltSim(cluster=cl, nSims=10000, size=100))
   user  system elapsed 
  0.028   0.004   0.375 
> system.time(FECltSim(nSims=10000, size=100))
   user  system elapsed 
  8.865   0.408  11.309 

The SNOW implementation shaves off about 23% of computing time relative to an unparallelized run (time savings get bigger as the number of simulations increase, as we would expect). The foreach attempt actually increases run time by a factor of 20. Additionally, if I change %dopar% to %do% and check the unparallelized version of the loop, it takes over 7 seconds.

Additionally, we can consider the multicore package. The simulation written for multicore is

library(multicore)
MCCltSim <- function(nSims=1000, size=100, mu=0, sigma=1){
  unlist(mclapply(1:nSims, function(x){
    mean(rnorm(n=size, mean=mu, sd=sigma))
  }))
}

We get an even better speed improvement than SNOW:

> system.time(MCCltSim(nSims=10000, size=100))
   user  system elapsed 
  0.924   0.032   0.307 

Starting a new R session, we can attempt the foreach implementation using doMC instead of doSNOW, calling

library(doMC)
registerDoMC()

then running FECltSim() as above, still finding

> system.time(FECltSim(nSims=10000, size=100))
   user  system elapsed 
  6.800   0.024   6.887 

This is "only" a 14-fold increase over the non-parallelized runtime.

Conclusion: My foreach code is not running efficiently under either doSNOW or doMC. Any idea why?

Thanks, Charlie

解决方案

To start with, you could write your foreach code a bit more concise :

FECltSim <- function(nSims=1000, size=100, mu=0, sigma=1) {
  foreach(i=1:nSims, .combine=c) %dopar% {
    mean(rnorm(n=size, mean=mu, sd=sigma))
  }
}

This gives you a vector, no need to explicitly make it within the loop. Also no need to use cbind, as your result is every time just a single number. So .combine=c will do

The thing with foreach is that it creates quite a lot of overhead to communicate between the cores and get the results of the different cores fit together. A quick look at the profile shows this pretty clearly :

$by.self
                         self.time self.pct total.time total.pct
$                             5.46    41.30       5.46     41.30
$<-                           0.76     5.75       0.76      5.75
.Call                         0.76     5.75       0.76      5.75
...

More than 40% of the time it is busy selecting things. It also uses a lot of other functions for the whole operation. Actually, foreach is only advisable if you have relatively few rounds through very time consuming functions.

The other two solutions are built on a different technology, and do far less in R. On a sidenode, snow is actually initially developed to work on clusters more than on single workstations, like multicore is.

这篇关于使用foreach并行化的问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆