为什么 foreach() %do% 有时比 for 慢? [英] Why is foreach() %do% sometimes slower than for?

查看:29
本文介绍了为什么 foreach() %do% 有时比 for 慢?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我第一次在 R 中玩并行化.作为第一个玩具示例,我尝试了

I'm playing around with parallellization in R for the first time. As a first toy example, I tried

library(doMC)
registerDoMC()

B<-10000

myFunc<-function()
{
    for(i in 1:B) sqrt(i)
}

myFunc2<-function()
{
    foreach(i = 1:B)  %do% sqrt(i)
}

myParFunc<-function()
{
    foreach(i = 1:B) %dopar% sqrt(i)
}

我知道 sqrt() 执行速度太快以至于并行化无关紧要,但我没想到的是 foreach() %do% 会比for():

I know that sqrt() executes too fast for parallellization to matter, but what I didn't expect was that foreach() %do% would be slower than for():

> system.time(myFunc())
   user  system elapsed 
  0.004   0.000   0.005 
> system.time(myFunc2())
   user  system elapsed 
  6.756   0.000   6.759 
> system.time(myParFunc())
   user  system elapsed 
  6.140   0.524   6.096 

在我见过的大多数例子中,foreach() %dopar%foreach() %do% 而不是 for().由于在我的玩具示例中 foreach() %do%for() 慢得多,我现在有点困惑.不知何故,我认为这些是构造 for 循环的等效方法.有什么区别?它们是等价的吗?foreach() %do% 总是慢吗?

In most examples that I've seen, foreach() %dopar% is compared to foreach() %do% rather than for(). Since foreach() %do% was much slower than for() in my toy example, I'm now a bit confused. Somehow, I thought that these were equivalent ways of constructing for-loops. What is the difference? Are they ever equivalent? Is foreach() %do% always slower?

更新:按照@Peter Fines 的回答,我更新 myFunc 如下:

UPDATE: Following @Peter Fines answer, I update myFunc as follows:

 a<-rep(NA,B)
 myFunc<-function()
 {
     for(i in 1:B) a[i]<-sqrt(i)
 }

这使得 for() 变慢了一点,但不会太慢:

This makes for() a bit slower, but not much:

> system.time(myFunc())
   user  system elapsed 
  0.036   0.000   0.035 
> system.time(myFunc2())
   user  system elapsed 
  6.380   0.000   6.385 

推荐答案

for 将运行 sqrt B 次,大概每次都会丢弃答案.然而,foreach 返回一个包含循环体每次执行结果的列表.无论是在并行模式还是顺序模式(%dopar%%do%)下运行,这都会产生相当大的额外开销.

for will run sqrt B times, presumably discarding the answer each time. foreach, however, returns a list containing the result of each execution of the loop body. This would contribute considerable extra overhead, regardless of whether it's running in parallel or sequential mode (%dopar% or %do%).

我通过运行以下代码来回答我的问题,这似乎得到了 foreach 小插图,其中指出foreach 与 for 循环的不同之处在于它的返回值是一个值列表,而 for 循环没有值并使用副作用来传达其结果."

I based my answer by running the following code, which appears to be confirmed by the foreach vignette, which states "foreach differs from a for loop in that its return is a list of values, whereas a for loop has no value and uses side effects to convey its result."

> print(for(i in 1:10) sqrt(i))
NULL

> print(foreach(i = 1:10) %do% sqrt(i))
[[1]]
[1] 1

[[2]]
[1] 1.414214

[[3]]
... etc

更新:我从您更新的问题中看到,上述答案几乎不足以说明性能差异.所以我查看了 foreach源代码 并且可以看到有很多事情发生!我还没有试图确切地理解它是如何工作的,但是 do.Rforeach.R 显示即使 %do% 运行,foreach 配置的大部分仍在运行,如果主要提供 %do% 选项以允许您测试 foreach,这将是有意义的代码> 代码,而无需配置和加载并行后端.它还需要支持 foreach 提供的更高级的嵌套和迭代工具.

UPDATE: I see from your updated question that the above answer isn't nearly sufficient to account for the performance difference. So I looked at the source code for foreach and can see that there is a LOT going on! I haven't tried to understand exactly how it works, however do.R and foreach.R show that even when %do% is run, large parts of the foreach configuration is still run, which would make sense if perhaps the %do% option is largely provided to allow you to test foreach code without having to have a parallel backend configured and loaded. It also needs to support the more advanced nesting and iteration facilities that foreach provides.

代码中引用了结果缓存、错误检查、调试以及为每次迭代的参数创建本地环境变量(参见do.R<中的函数doSEQ/code> 例如).我想这就是造成您观察到的差异的原因.当然,如果您在循环中运行更复杂的代码(这实际上会受益于像 foreach 这样的并行化框架),与它提供的好处相比,这种开销将变得无关紧要.

There are references in the code to results caching, error checking, debugging and the creation of local environment variables for the arguments of each iteration (see the function doSEQ in do.R for example). I'd imagine this is what creates the difference that you've observed. Of course, if you were running much more complicated code inside your loop (that would actually benefit from a parallelisation framework like foreach), this overhead would become irrelevant compared with the benefits it provides.

这篇关于为什么 foreach() %do% 有时比 for 慢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆