R中的并行化:%dopar%vs%do%。为什么使用单核收益更好的性能? [英] Parallelization in R: %dopar% vs %do%. Why using a single core yields to better performance?
问题描述
$ b $ $ p $
require(foreach)
require(doMC)
#1-core
>系统时间(m < - foreach(i = 1:100)%dopar%
+矩阵(rnorm(1000 * 1000),ncol = 5000))
用户系统已用
9.285 1.895 11.083
> system.time(m < - foreach(i = 1:100)%do%
+ matrix(rnorm(1000 * 1000),ncol = 5000))
用户系统已用
9.139 1.879 10.979
#2-core
> registerDoMC(cores = 2)
>系统时间(m < - foreach(i = 1:100)%dopar%
+矩阵(rnorm(1000 * 1000),ncol = 5000))
用户系统已用
3.322 3.737 132.027
> system.time(m < - foreach(i = 1:100)%do%
+ matrix(rnorm(1000 * 1000),ncol = 5000))
用户系统消耗
9.744 2.054 11.740
在很少的试验中使用4个内核会产生非常不同的结果:
> registerDoMC(cores = 4)
>系统时间(m < - foreach(i = 1:100)%dopar%
{矩阵(rnorm(1000 * 1000),ncol = 5000)})
用户系统经过
11.522 4.082 24.444
>系统时间(m < - foreach(i = 1:100)%dopar%
{矩阵(rnorm(1000 * 1000),ncol = 5000)})
用户系统经过
21.388 6.299 25.437
>系统时间(m < - foreach(i = 1:100)%dopar%
{矩阵(rnorm(1000 * 1000),ncol = 5000)})
用户系统经过
17.439 5.250 9.300
>系统时间(m < - foreach(i = 1:100)%dopar%
{矩阵(rnorm(1000 * 1000),ncol = 5000)})
用户系统经过
17.480 5.264 9.170
处理时间。如果没有返回结果,这些是我的机器上的 cores = 2
方案的时间。它基本上是相同的代码,只有被创建的矩阵被丢弃,而不是被返回:
> system.time(m < - foreach(i = 1:100)%do%
+ {matrix(rnorm(1000 * 1000),ncol = 5000); NULL})
user system elapsed
13.793 0.376 14.197
>系统时间(m < - foreach(i = 1:100)%dopar%
+ {矩阵(rnorm(1000 * 1000),ncol = 5000); NULL})
user system elapsed
8.057 5.236 9.970
仍然不是最优的,但至少现在平行版本更快。 / p>
这是来自 doMC
的文件:
doMC
包为
foreach
/ <$ c提供了一个并行后端使用
parallel
包的多核功能的$ c>%dopar%函数。
blockquote>
现在,
parallel 使用一个
fork
机制产生相同的副本的R进程。从单独的进程收集结果是一项昂贵的任务,这就是您在时间测量中看到的结果。I'm experiencing a weird behaviour in my computer when distributing processes among its cores using doMC and foreach. Does someone knows why using single core I got better performance than using 2 cores? As you can see, processing the same code without register any core (which supposedly use only 1 core) yields to a much more time-efficiency processing. While %do% seems to perform better than %dopar%, registering 2 cores out of 4 yield to more time consuming.
require(foreach) require(doMC) # 1-core > system.time(m <- foreach(i=1:100) %dopar% + matrix(rnorm(1000*1000), ncol=5000) ) user system elapsed 9.285 1.895 11.083 > system.time(m <- foreach(i=1:100) %do% + matrix(rnorm(1000*1000), ncol=5000) ) user system elapsed 9.139 1.879 10.979 # 2-core > registerDoMC(cores=2) > system.time(m <- foreach(i=1:100) %dopar% + matrix(rnorm(1000*1000), ncol=5000) ) user system elapsed 3.322 3.737 132.027 > system.time(m <- foreach(i=1:100) %do% + matrix(rnorm(1000*1000), ncol=5000) ) user system elapsed 9.744 2.054 11.740
Using 4 cores in few trials yield to very different outcomes:
> registerDoMC(cores=4) > system.time(m <- foreach(i=1:100) %dopar% { matrix(rnorm(1000*1000), ncol=5000) } ) user system elapsed 11.522 4.082 24.444 > system.time(m <- foreach(i=1:100) %dopar% { matrix(rnorm(1000*1000), ncol=5000) } ) user system elapsed 21.388 6.299 25.437 > system.time(m <- foreach(i=1:100) %dopar% { matrix(rnorm(1000*1000), ncol=5000) } ) user system elapsed 17.439 5.250 9.300 > system.time(m <- foreach(i=1:100) %dopar% { matrix(rnorm(1000*1000), ncol=5000) } ) user system elapsed 17.480 5.264 9.170
解决方案It's the combination of results that eats all the processing time. These are the timings on my machine for the
cores=2
scenario if no results are returned. It's essentially the same code, only the created matrices are discarded instead of being returned:> system.time(m <- foreach(i=1:100) %do% + { matrix(rnorm(1000*1000), ncol=5000); NULL } ) user system elapsed 13.793 0.376 14.197 > system.time(m <- foreach(i=1:100) %dopar% + { matrix(rnorm(1000*1000), ncol=5000); NULL } ) user system elapsed 8.057 5.236 9.970
Still not optimal, but at least the parallel version is now faster.
This is from documentation of
doMC
:The
doMC
package provides a parallel backend for theforeach
/%dopar%
function using the multicore functionality of theparallel
package.Now,
parallel
uses afork
mechanism to spawn identical copies of the R process. Collecting results from separate processes is an expensive task, and this is what you see in your time measurements.这篇关于R中的并行化:%dopar%vs%do%。为什么使用单核收益更好的性能?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!