使用 foreach 并行化的问题 [英] Problems using foreach parallelization
问题描述
我正在尝试比较并行化选项.具体来说,我将标准的 SNOW
和 multicore
实现与使用 doSNOW
或 doMC
和 的实现进行比较foreach
.作为一个示例问题,我通过多次计算从标准正态分布中抽取的样本的均值来说明中心极限定理.这是标准代码:
I'm trying to compare parallelization options. Specifically, I'm comparing the standard SNOW
and mulitcore
implementations to those using doSNOW
or doMC
and foreach
. As a sample problem, I'm illustrating the central limit theorem by computing the means of samples drawn from a standard normal distribution many times. Here's the standard code:
CltSim <- function(nSims=1000, size=100, mu=0, sigma=1){
sapply(1:nSims, function(x){
mean(rnorm(n=size, mean=mu, sd=sigma))
})
}
这是 SNOW
的实现:
library(snow)
cl <- makeCluster(2)
ParCltSim <- function(cluster, nSims=1000, size=100, mu=0, sigma=1){
parSapply(cluster, 1:nSims, function(x){
mean(rnorm(n=size, mean=mu, sd=sigma))
})
}
接下来是doSNOW
方法:
library(foreach)
library(doSNOW)
registerDoSNOW(cl)
FECltSim <- function(nSims=1000, size=100, mu=0, sigma=1) {
x <- numeric(nSims)
foreach(i=1:nSims, .combine=cbind) %dopar% {
x[i] <- mean(rnorm(n=size, mean=mu, sd=sigma))
}
}
我得到以下结果:
> system.time(CltSim(nSims=10000, size=100))
user system elapsed
0.476 0.008 0.484
> system.time(ParCltSim(cluster=cl, nSims=10000, size=100))
user system elapsed
0.028 0.004 0.375
> system.time(FECltSim(nSims=10000, size=100))
user system elapsed
8.865 0.408 11.309
SNOW
实现相对于无与伦比的运行减少了大约 23% 的计算时间(正如我们所预期的那样,随着模拟次数的增加,节省的时间越来越多).foreach
尝试实际上增加 20 倍的运行时间.此外,如果我将 %dopar%
更改为 %do%
并检查循环的无与伦比的版本,它需要超过 7 秒.
The SNOW
implementation shaves off about 23% of computing time relative to an unparallelized run (time savings get bigger as the number of simulations increase, as we would expect). The foreach
attempt actually increases run time by a factor of 20. Additionally, if I change %dopar%
to %do%
and check the unparallelized version of the loop, it takes over 7 seconds.
此外,我们可以考虑 multicore
包.为multicore
编写的模拟是
Additionally, we can consider the multicore
package. The simulation written for multicore
is
library(multicore)
MCCltSim <- function(nSims=1000, size=100, mu=0, sigma=1){
unlist(mclapply(1:nSims, function(x){
mean(rnorm(n=size, mean=mu, sd=sigma))
}))
}
我们获得了比 SNOW
更好的速度提升:
We get an even better speed improvement than SNOW
:
> system.time(MCCltSim(nSims=10000, size=100))
user system elapsed
0.924 0.032 0.307
开始一个新的R会话,我们可以尝试使用doMC
代替doSNOW
来实现foreach
,调用
Starting a new R session, we can attempt the foreach
implementation using doMC
instead of doSNOW
, calling
library(doMC)
registerDoMC()
然后像上面那样运行FECltSim()
,还是发现
then running FECltSim()
as above, still finding
> system.time(FECltSim(nSims=10000, size=100))
user system elapsed
6.800 0.024 6.887
这仅"比非并行运行时增加了 14 倍.
This is "only" a 14-fold increase over the non-parallelized runtime.
结论:我的 foreach
代码在 doSNOW
或 doMC
下都没有有效运行.知道为什么吗?
Conclusion: My foreach
code is not running efficiently under either doSNOW
or doMC
. Any idea why?
谢谢,查理
推荐答案
首先,你可以编写更简洁的 foreach 代码:
To start with, you could write your foreach code a bit more concise :
FECltSim <- function(nSims=1000, size=100, mu=0, sigma=1) {
foreach(i=1:nSims, .combine=c) %dopar% {
mean(rnorm(n=size, mean=mu, sd=sigma))
}
}
这为您提供了一个向量,无需在循环中明确地制作它.也无需使用 cbind,因为您的结果每次都只是一个数字.所以 .combine=c
会做
This gives you a vector, no need to explicitly make it within the loop. Also no need to use cbind, as your result is every time just a single number. So .combine=c
will do
foreach 的问题在于,它在内核之间进行通信并将不同内核的结果组合在一起会产生相当多的开销.快速浏览一下个人资料就可以很清楚地看到这一点:
The thing with foreach is that it creates quite a lot of overhead to communicate between the cores and get the results of the different cores fit together. A quick look at the profile shows this pretty clearly :
$by.self
self.time self.pct total.time total.pct
$ 5.46 41.30 5.46 41.30
$<- 0.76 5.75 0.76 5.75
.Call 0.76 5.75 0.76 5.75
...
超过 40% 的时间都在忙于选择事物.它还为整个操作使用了许多其他功能.实际上,foreach
仅在您执行非常耗时的函数的轮次相对较少时才可取.
More than 40% of the time it is busy selecting things. It also uses a lot of other functions for the whole operation. Actually, foreach
is only advisable if you have relatively few rounds through very time consuming functions.
另外两个解决方案建立在不同的技术上,在 R 中做的要少得多.在侧节点上,snow
实际上最初是为了在集群上工作而不是在单个工作站上工作,例如 多核
是.
The other two solutions are built on a different technology, and do far less in R. On a sidenode, snow
is actually initially developed to work on clusters more than on single workstations, like multicore
is.
这篇关于使用 foreach 并行化的问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!