为什么foreach()%do%有时比for慢? [英] Why is foreach() %do% sometimes slower than for?

查看:178
本文介绍了为什么foreach()%do%有时比for慢?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我第一次玩R中的并行化.作为第一个玩具示例,我尝试了

I'm playing around with parallellization in R for the first time. As a first toy example, I tried

library(doMC)
registerDoMC()

B<-10000

myFunc<-function()
{
    for(i in 1:B) sqrt(i)
}

myFunc2<-function()
{
    foreach(i = 1:B)  %do% sqrt(i)
}

myParFunc<-function()
{
    foreach(i = 1:B) %dopar% sqrt(i)
}

我知道sqrt()的执行速度太快,以至于并行化都不重要,但是我没想到的是foreach() %do%会比for()慢:

I know that sqrt() executes too fast for parallellization to matter, but what I didn't expect was that foreach() %do% would be slower than for():

> system.time(myFunc())
   user  system elapsed 
  0.004   0.000   0.005 
> system.time(myFunc2())
   user  system elapsed 
  6.756   0.000   6.759 
> system.time(myParFunc())
   user  system elapsed 
  6.140   0.524   6.096 

在我看到的大多数示例中,将foreach() %dopar%foreach() %do%而不是for()进行了比较.因为在我的玩具示例中foreach() %do%for()慢得多,所以我现在有些困惑.不知何故,我认为这些是构造for循环的等效方法.有什么区别?他们曾经等效吗? foreach() %do%总是慢吗?

In most examples that I've seen, foreach() %dopar% is compared to foreach() %do% rather than for(). Since foreach() %do% was much slower than for() in my toy example, I'm now a bit confused. Somehow, I thought that these were equivalent ways of constructing for-loops. What is the difference? Are they ever equivalent? Is foreach() %do% always slower?

更新:按照@Peter Fines的回答,我如下更新myFunc:

UPDATE: Following @Peter Fines answer, I update myFunc as follows:

 a<-rep(NA,B)
 myFunc<-function()
 {
     for(i in 1:B) a[i]<-sqrt(i)
 }

这会使for()变慢一些,但幅度不大:

This makes for() a bit slower, but not much:

> system.time(myFunc())
   user  system elapsed 
  0.036   0.000   0.035 
> system.time(myFunc2())
   user  system elapsed 
  6.380   0.000   6.385 

推荐答案

for将运行sqrt B次,大概每次都丢弃答案.但是,foreach返回一个列表,其中包含循环主体每次执行的结果.不管它是并行运行还是顺序运行(%dopar%%do%),这都将带来可观的额外开销.

for will run sqrt B times, presumably discarding the answer each time. foreach, however, returns a list containing the result of each execution of the loop body. This would contribute considerable extra overhead, regardless of whether it's running in parallel or sequential mode (%dopar% or %do%).

我通过运行以下代码来得出我的答案,该代码似乎已得到的确认. foreach小插图,其中指出:"foreach与for循环的不同之处在于,它的返回值是一个值列表,而for循环没有值,并且使用副作用来传达其结果."

I based my answer by running the following code, which appears to be confirmed by the foreach vignette, which states "foreach differs from a for loop in that its return is a list of values, whereas a for loop has no value and uses side effects to convey its result."

> print(for(i in 1:10) sqrt(i))
NULL

> print(foreach(i = 1:10) %do% sqrt(i))
[[1]]
[1] 1

[[2]]
[1] 1.414214

[[3]]
... etc

更新:我从您更新的问题中看到,上述答案几乎不足以解决性能差异.因此,我查看了foreach源代码,可以看到有一个很多继续!我没有试图确切了解它是如何工作的,但是do.Rforeach.R表明即使运行%do%时,大部分foreach配置仍在运行,如果<主要提供了c15>选项,使您可以测试foreach代码,而不必配置和加载并行后端.它还需要支持foreach提供的更高级的嵌套和迭代功能.

UPDATE: I see from your updated question that the above answer isn't nearly sufficient to account for the performance difference. So I looked at the source code for foreach and can see that there is a LOT going on! I haven't tried to understand exactly how it works, however do.R and foreach.R show that even when %do% is run, large parts of the foreach configuration is still run, which would make sense if perhaps the %do% option is largely provided to allow you to test foreach code without having to have a parallel backend configured and loaded. It also needs to support the more advanced nesting and iteration facilities that foreach provides.

代码中有引用,用于结果缓存,错误检查,调试以及为每次迭代的参数创建本地环境变量(例如,参见do.R中的函数doSEQ).我以为这就是造成您所观察到的差异的原因.当然,如果您在循环内运行复杂得多的代码(这实际上会从foreach这样的并行化框架中受益),那么与其开销相比,这种开销将变得无关紧要.

There are references in the code to results caching, error checking, debugging and the creation of local environment variables for the arguments of each iteration (see the function doSEQ in do.R for example). I'd imagine this is what creates the difference that you've observed. Of course, if you were running much more complicated code inside your loop (that would actually benefit from a parallelisation framework like foreach), this overhead would become irrelevant compared with the benefits it provides.

这篇关于为什么foreach()%do%有时比for慢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆