使用 foreach 并行化的问题 [英] Problems using foreach parallelization

查看:25
本文介绍了使用 foreach 并行化的问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试比较并行化选项.具体来说,我将标准的 SNOWmulticore 实现与使用 doSNOWdoMC 的实现进行比较foreach.作为一个示例问题,我通过多次计算从标准正态分布中抽取的样本的均值来说明中心极限定理.这是标准代码:

I'm trying to compare parallelization options. Specifically, I'm comparing the standard SNOW and mulitcore implementations to those using doSNOW or doMC and foreach. As a sample problem, I'm illustrating the central limit theorem by computing the means of samples drawn from a standard normal distribution many times. Here's the standard code:

CltSim <- function(nSims=1000, size=100, mu=0, sigma=1){
  sapply(1:nSims, function(x){
    mean(rnorm(n=size, mean=mu, sd=sigma))
  })
}

这是 SNOW 的实现:

library(snow)
cl <- makeCluster(2)

ParCltSim <- function(cluster, nSims=1000, size=100, mu=0, sigma=1){
  parSapply(cluster, 1:nSims, function(x){
    mean(rnorm(n=size, mean=mu, sd=sigma))
  })
}

接下来是doSNOW方法:

library(foreach)
library(doSNOW)
registerDoSNOW(cl)

FECltSim <- function(nSims=1000, size=100, mu=0, sigma=1) {
  x <- numeric(nSims)
  foreach(i=1:nSims, .combine=cbind) %dopar% {
    x[i] <- mean(rnorm(n=size, mean=mu, sd=sigma))
  }
}

我得到以下结果:

> system.time(CltSim(nSims=10000, size=100))
   user  system elapsed 
  0.476   0.008   0.484 
> system.time(ParCltSim(cluster=cl, nSims=10000, size=100))
   user  system elapsed 
  0.028   0.004   0.375 
> system.time(FECltSim(nSims=10000, size=100))
   user  system elapsed 
  8.865   0.408  11.309 

SNOW 实现相对于无与伦比的运行减少了大约 23% 的计算时间(正如我们所预期的那样,随着模拟次数的增加,节省的时间越来越多).foreach 尝试实际上增加 20 倍的运行时间.此外,如果我将 %dopar% 更改为 %do% 并检查循环的无与伦比的版本,它需要超过 7 秒.

The SNOW implementation shaves off about 23% of computing time relative to an unparallelized run (time savings get bigger as the number of simulations increase, as we would expect). The foreach attempt actually increases run time by a factor of 20. Additionally, if I change %dopar% to %do% and check the unparallelized version of the loop, it takes over 7 seconds.

此外,我们可以考虑 multicore 包.为multicore编写的模拟是

Additionally, we can consider the multicore package. The simulation written for multicore is

library(multicore)
MCCltSim <- function(nSims=1000, size=100, mu=0, sigma=1){
  unlist(mclapply(1:nSims, function(x){
    mean(rnorm(n=size, mean=mu, sd=sigma))
  }))
}

我们获得了比 SNOW 更好的速度提升:

We get an even better speed improvement than SNOW:

> system.time(MCCltSim(nSims=10000, size=100))
   user  system elapsed 
  0.924   0.032   0.307 

开始一个新的R会话,我们可以尝试使用doMC代替doSNOW来实现foreach,调用

Starting a new R session, we can attempt the foreach implementation using doMC instead of doSNOW, calling

library(doMC)
registerDoMC()

然后像上面那样运行FECltSim(),还是发现

then running FECltSim() as above, still finding

> system.time(FECltSim(nSims=10000, size=100))
   user  system elapsed 
  6.800   0.024   6.887 

这仅"比非并行运行时增加了 14 倍.

This is "only" a 14-fold increase over the non-parallelized runtime.

结论:我的 foreach 代码在 doSNOWdoMC 下都没有有效运行.知道为什么吗?

Conclusion: My foreach code is not running efficiently under either doSNOW or doMC. Any idea why?

谢谢,查理

推荐答案

首先,你可以编写更简洁的 foreach 代码:

To start with, you could write your foreach code a bit more concise :

FECltSim <- function(nSims=1000, size=100, mu=0, sigma=1) {
  foreach(i=1:nSims, .combine=c) %dopar% {
    mean(rnorm(n=size, mean=mu, sd=sigma))
  }
}

这为您提供了一个向量,无需在循环中明确地制作它.也无需使用 cbind,因为您的结果每次都只是一个数字.所以 .combine=c 会做

This gives you a vector, no need to explicitly make it within the loop. Also no need to use cbind, as your result is every time just a single number. So .combine=c will do

foreach 的问题在于,它在内核之间进行通信并将不同内核的结果组合在一起会产生相当多的开销.快速浏览一下个人资料就可以很清楚地看到这一点:

The thing with foreach is that it creates quite a lot of overhead to communicate between the cores and get the results of the different cores fit together. A quick look at the profile shows this pretty clearly :

$by.self
                         self.time self.pct total.time total.pct
$                             5.46    41.30       5.46     41.30
$<-                           0.76     5.75       0.76      5.75
.Call                         0.76     5.75       0.76      5.75
...

超过 40% 的时间都在忙于选择事物.它还为整个操作使用了许多其他功能.实际上,foreach 仅在您执行非常耗时的函数的轮次相对较少时才可取.

More than 40% of the time it is busy selecting things. It also uses a lot of other functions for the whole operation. Actually, foreach is only advisable if you have relatively few rounds through very time consuming functions.

另外两个解决方案建立在不同的技术上,在 R 中做的要少得多.在侧节点上,snow 实际上最初是为了在集群上工作而不是在单个工作站上工作,例如 多核是.

The other two solutions are built on a different technology, and do far less in R. On a sidenode, snow is actually initially developed to work on clusters more than on single workstations, like multicore is.

这篇关于使用 foreach 并行化的问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆