mclapply 调用应该嵌套吗? [英] Should mclapply calls be nested?

查看:47
本文介绍了mclapply 调用应该嵌套吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

嵌套 parallel::mclapply 调用是个好主意吗?

Is nesting parallel::mclapply calls a good idea?

require(parallel)
ans <- mclapply(1:3, function(x) mclapply(1:3, function(y) y * x))
unlist(ans)

输出:

[1] 1 2 3 2 4 6 3 6 9

所以它有效".但是对于超过内核数量的实际计算密集型任务,是否推荐使用它?执行此操作时发生了什么?涉及的多个分叉是否更可能造成浪费?mc.coresmc.preschedule 有哪些注意事项?

So it's "working". But is it recommended for real compute-intensive tasks that outnumber the number of cores? what is going on when this is executed? Are the multiple forks involved more potentially wasteful? What are the considerations for mc.cores and mc.preschedule?

编辑只是为了澄清动机,通常通过拆分一个维度来并行化似乎很自然(例如,使用不同的核心来处理来自 n 年不同年份的数据),然后在这个拆分中出现另一种自然的拆分方式(例如,使用不同的内核来计算 m 个不同函数中的每一个).当 m 乘以 n 小于可用内核的总数时,上述嵌套看起来很合理,至少从表面上看是这样.

Edit Just to clarify the motivation, often it seems natural to parallelize by splitting one dimension (e.g., use different cores to handle data from n different years), then within this split comes another natural way to split (e.g., use different cores to calculate each one of m different functions). When m times n is smaller than the total number of available cores the above nesting looks sensible, at least on the face of it.

推荐答案

在下面的实验中,测试函数 testfn()并行执行速度比嵌套并行执行:

In the following experiment, the parallel execution of the test function testfn() was faster compared to the nested parallel execution:

library(parallel)
library(microbenchmark)
testfn <- function(x) rnorm(10000000)

microbenchmark('parallel'= o <- mclapply(1:8, testfn, mc.cores=4),
               'nested'  = o <- mclapply(1:2, function(x) mclapply(1:4, testfn, mc.cores=2), 
                                         mc.cores=2),
               times=10)
Unit: seconds
     expr      min       lq     mean   median       uq      max neval
 parallel 3.727131 3.756445 3.802470 3.815977 3.834144 3.890128    10
   nested 4.355846 4.372996 4.508291 4.453881 4.578837 4.863664    10

说明:
R 会话和四个 R 工作线程之间的通信似乎比 R 会话和两个工作线程之间的通信更有效,这两个工作线程依次分叉并分别与其他两个工作线程通信.

Explanation:
The communication between the R session and four R workers seems to be more efficient than the communication between the R session and two workers which in turn fork and communicate to two other workers each.

替代方案:
foreach 包可以处理嵌套循环,这接近于嵌套的 mclapply() 调用;见小插图 https://cran.r-project.org/web/packages/foreach/vignettes/nested.pdf.

Alternative:
The package foreach can handle nested loops, which is close to nested mclapply() calls; see the vignette https://cran.r-project.org/web/packages/foreach/vignettes/nested.pdf.

(参数mc.preschedule 的最佳设置取决于具体问题;请参阅帮助页面?mclapply.)

(The optimal setting of the argument mc.preschedule depends on the specific problem; see the help page ?mclapply.)

这篇关于mclapply 调用应该嵌套吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆