R中的并行化:%dopar%vs%do%。为什么使用单核收益更好的性能? [英] Parallelization in R: %dopar% vs %do%. Why using a single core yields to better performance?

查看:335
本文介绍了R中的并行化:%dopar%vs%do%。为什么使用单核收益更好的性能?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在使用doMC和foreach在其内核之间分发进程时,我的计算机中出现了奇怪的行为。有人知道为什么使用单核我得到比使用2核更好的性能?正如你所看到的,处理相同的代码而不注册任何内核(据说只使用1个内核),会产生更多的时间效率处理。虽然%do%似乎表现好于%dopar%,但注册4个内核中的2个内核会耗费更多的时间。
$ b $ $ p $ require(foreach)
require(doMC)
#1-core
>系统时间(m < - foreach(i = 1:100)%dopar%
+矩阵(rnorm(1000 * 1000),ncol = 5000))
用户系统已用
9.285 1.895 11.083
> system.time(m < - foreach(i = 1:100)%do%
+ matrix(rnorm(1000 * 1000),ncol = 5000))
用户系统已用
9.139 1.879 10.979

#2-core
> registerDoMC(cores = 2)
>系统时间(m < - foreach(i = 1:100)%dopar%
+矩阵(rnorm(1000 * 1000),ncol = 5000))
用户系统已用
3.322 3.737 132.027
> system.time(m < - foreach(i = 1:100)%do%
+ matrix(rnorm(1000 * 1000),ncol = 5000))
用户系统消耗
9.744 2.054 11.740

在很少的试验中使用4个内核会产生非常不同的结果:

 > registerDoMC(cores = 4)
>系统时间(m < - foreach(i = 1:100)%dopar%
{矩阵(rnorm(1000 * 1000),ncol = 5000)})
用户系统经过
11.522 4.082 24.444
>系统时间(m < - foreach(i = 1:100)%dopar%
{矩阵(rnorm(1000 * 1000),ncol = 5000)})
用户系统经过
21.388 6.299 25.437
>系统时间(m < - foreach(i = 1:100)%dopar%
{矩阵(rnorm(1000 * 1000),ncol = 5000)})
用户系统经过
17.439 5.250 9.300
>系统时间(m < - foreach(i = 1:100)%dopar%
{矩阵(rnorm(1000 * 1000),ncol = 5000)})
用户系统经过
17.480 5.264 9.170


解决方案

处理时间。如果没有返回结果,这些是我的机器上的 cores = 2 方案的时间。它基本上是相同的代码,只有被创建的矩阵被丢弃,而不是被返回:

 > system.time(m < -  foreach(i = 1:100)%do%
+ {matrix(rnorm(1000 * 1000),ncol = 5000); NULL})
user system elapsed
13.793 0.376 14.197
>系统时间(m < - foreach(i = 1:100)%dopar%
+ {矩阵(rnorm(1000 * 1000),ncol = 5000); NULL})
user system elapsed
8.057 5.236 9.970

仍然不是最优的,但至少现在平行版本更快。 / p>

这是来自 doMC 的文件:


doMC 包为
foreach / <$ c提供了一个并行后端使用
parallel 包的多核功能的$ c>%dopar%函数。
blockquote>

现在, parallel 使用一个 fork 机制产生相同的副本的R进程。从单独的进程收集结果是一项昂贵的任务,这就是您在时间测量中看到的结果。


I'm experiencing a weird behaviour in my computer when distributing processes among its cores using doMC and foreach. Does someone knows why using single core I got better performance than using 2 cores? As you can see, processing the same code without register any core (which supposedly use only 1 core) yields to a much more time-efficiency processing. While %do% seems to perform better than %dopar%, registering 2 cores out of 4 yield to more time consuming.

require(foreach)
require(doMC)
# 1-core
> system.time(m <- foreach(i=1:100) %dopar% 
+ matrix(rnorm(1000*1000), ncol=5000) )
   user  system elapsed 
  9.285   1.895  11.083 
> system.time(m <- foreach(i=1:100) %do% 
+ matrix(rnorm(1000*1000), ncol=5000) )
   user  system elapsed 
  9.139   1.879  10.979 

# 2-core
> registerDoMC(cores=2)
> system.time(m <- foreach(i=1:100) %dopar% 
+ matrix(rnorm(1000*1000), ncol=5000) )
   user  system elapsed 
  3.322   3.737 132.027
> system.time(m <- foreach(i=1:100) %do% 
+ matrix(rnorm(1000*1000), ncol=5000) )
   user  system elapsed 
  9.744   2.054  11.740 

Using 4 cores in few trials yield to very different outcomes:

> registerDoMC(cores=4)
> system.time(m <- foreach(i=1:100) %dopar% 
{ matrix(rnorm(1000*1000), ncol=5000) } )
   user  system elapsed 
 11.522   4.082  24.444 
> system.time(m <- foreach(i=1:100) %dopar% 
{ matrix(rnorm(1000*1000), ncol=5000) } )
   user  system elapsed 
 21.388   6.299  25.437 
> system.time(m <- foreach(i=1:100) %dopar% 
{ matrix(rnorm(1000*1000), ncol=5000) } )
   user  system elapsed 
 17.439   5.250   9.300 
> system.time(m <- foreach(i=1:100) %dopar% 
{ matrix(rnorm(1000*1000), ncol=5000) } )
   user  system elapsed 
 17.480   5.264   9.170

解决方案

It's the combination of results that eats all the processing time. These are the timings on my machine for the cores=2 scenario if no results are returned. It's essentially the same code, only the created matrices are discarded instead of being returned:

> system.time(m <- foreach(i=1:100) %do% 
+ { matrix(rnorm(1000*1000), ncol=5000); NULL } )
   user  system elapsed 
 13.793   0.376  14.197 
> system.time(m <- foreach(i=1:100) %dopar% 
+ { matrix(rnorm(1000*1000), ncol=5000); NULL } )
   user  system elapsed 
  8.057   5.236   9.970 

Still not optimal, but at least the parallel version is now faster.

This is from documentation of doMC:

The doMC package provides a parallel backend for the foreach/%dopar% function using the multicore functionality of the parallel package.

Now, parallel uses a fork mechanism to spawn identical copies of the R process. Collecting results from separate processes is an expensive task, and this is what you see in your time measurements.

这篇关于R中的并行化:%dopar%vs%do%。为什么使用单核收益更好的性能?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆