R mclapply与foreach [英] R mclapply vs foreach
问题描述
我将mclapply用于所有令人尴尬的并行"计算.我发现它干净易用,并且当参数 mc.cores = 1
和 mc.preschedule = TRUE
时,我可以插入 browser()
像在常规R中一样,在 mclapply
内部的函数中进行调试,并逐行调试.这对于将代码更快地投入生产非常有用.
I use mclapply for all my "embarassingly parallel" computations. I find it clean and easy to use, and when arguments mc.cores = 1
and mc.preschedule = TRUE
I can insert browser()
in the function inside mclapply
and debug line by line just like in regular R. This is a huge help in getting code to production quicker.
foreach
提供了哪些 mclapply
不提供的功能?我应该考虑继续编写foreach代码吗?
What does foreach
offer that mclapply
does not? Is there a reason I should consider writing foreach code going forward?
如果我理解正确,出于性能原因,两者都可以使用 multicore
方法进行并行计算(允许分叉).
If I understand correctly, both can use the multicore
approach to parallel computations (permitting forking) which I like to use for performance reasons.
我已经看到 foreach
在各种软件包中使用过,并且已经阅读了它的基础知识,但是坦率地说,我认为它不那么容易使用.我也无法弄清楚如何在 foreach
函数调用中使用 browser()
.(是的,我已阅读此线程带有foreach%dopar%的浏览器模式,但没有帮助我使浏览器正常工作.)
I have seen foreach
being used in various packages, and have read the basics of it, but frankly I don't find it as easy to use. I also am unable to figure out how to get the browser()
to work in foreach
function calls. (yes I have read this thread browser mode with foreach %dopar% but didn't help me to get the browser to work right).
推荐答案
The problem is almost the same as described here: Understanding the differences between mclapply and parLapply in R .
mclapply
在调用 mclapply
时为每个工作进程(线程/核心)创建主进程的克隆,从而保证了可重复性.不幸的是,在Windows上这是不可能的,与多核相反,Windows总是通过 foreach
或 parLapply
使用多会话并行性.
The mclapply
is creating clones of the master process for each worker processes (threads/cores) at the point that mclapply
is called, reproducibility is guaranteed. Unfortunately, that isn't possible on Windows where in contrast to multicore there is always used the multisession parallelism by foreach
or parLapply
.
将 parLapply
或 foreach
与%dopar%
一起使用时,通常必须执行以下附加步骤:创建PSOCK群集,注册如果需要,请在群集上,将必需的程序包加载到群集工作器上,将必要的数据和功能导出到群集工作器的全局环境中.
When using parLapply
or foreach
with %dopar%
, you generally have to perform the following additional steps: Create a PSOCK cluster, Register the cluster if desired, Load necessary packages on the cluster workers, Export necessary data and functions to the global environment of the cluster workers.
这就是为什么 foreach
具有诸如 .packages
和 .export
之类的参数的原因,这些参数使我们能够在会话之间分配所有必需的东西.
That is why foreach
has parameters like .packages
and .export
which enable us to distribute everything needed across sessions.
future
软件包提供了mulicore和多会话处理之间差异的详细信息
future
package provided details of differences between mulicore and multisession processing https://cran.r-project.org/web/packages/future/vignettes/future-1-overview.html
这篇关于R mclapply与foreach的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!