R函数范围和并行性 [英] R function scope and parallelism
问题描述
考虑以下函数定义:
library(doParallel)
pre>
f_print< - function(x)
print(x)
}
f_foreach< - function(l)
{
foreach(i = 1)%do%
{
f_print(i)
}
}
f_foreach_parallel < - function(l)
{
doParallel :: registerDoParallel(1 )
foreach(i = 1)%dopar%
{
f_print(i)
}
}
函数用法:
> f_foreach(c(1,2))
[1] 1
[1] 2
[[1]]
[1] 1
[[2]]
[1] 2
> f_foreach_parallel(c(1,2))
显示Traceback
使用Debug重新运行
{:
任务1失败 - 不可能的事情中的错误f_print
[错误:找不到函数f_print]
>
你能解释为什么
f_print()
在foreach
中涉及并行时不可见?我们如何在这个并行的foreach
中使用f_print()
?与这一点有关的任何文档?解决方案除了顶级帖子的评论中已经说过的内容,尤其是指定
.export
,当使用 doFuture 软件包时,您的代码确实可以工作并行后端,操作系统和.export
。以下是您的设置的改编版本:
f_print < - function(x){
print(x)
$ b $ f_foreach< - function(l){
foreach(i = 1)%do%{
f_print(i)
}
$ b $ f_foreach_dopar< - function(l){
foreach(i = 1)%dopar%{
f_print(i)
}
}
而不是:
<$ p
$ worker < - parallel :: makeCluster(1L,outfile)
$ =)
registerDoParallel(workers)
f_foreach_dopar(c(1,2))
## {:任务1失败的错误 - 找不到函数f_print
您可以这样做:
<$ p
$如上
workers < - parallel :: makeCluster(1L,outfile) library(doFuture)
registerDoFuture =)
plan(cluster,workers = workers)
f_foreach_dopar(c(1,2))
## [1] 1
## [ 1] 2
# #[[1]]
## [1] 1
##
## [[2]]
## [1] 2
这个原因是因为doFuture做了一个更全面的搜索来识别全局变量(这里 f_print()
)。
PS。 outfile =
的原因是stdout / stderr输出(例如从 print()
)实际上是显示。在并行处理中重定向stdout / stderr(我不建议这么做)是一个完全不同的讨论,但我会假设您仅使用 print()
来举例。 / p>
Consider the following function definitions
library(doParallel)
f_print <- function(x)
{
print(x)
}
f_foreach <- function(l)
{
foreach (i=l) %do%
{
f_print(i)
}
}
f_foreach_parallel <- function(l)
{
doParallel::registerDoParallel(1)
foreach (i=l) %dopar%
{
f_print(i)
}
}
Function use :
> f_foreach(c(1,2))
[1] 1
[1] 2
[[1]]
[1] 1
[[2]]
[1] 2
> f_foreach_parallel(c(1,2))
Show Traceback
Rerun with Debug
Error in { :
task 1 failed - "impossible de trouver la fonction "f_print""
[Error: could not find function f_print]
>
Can you help explain why the f_print()
is not visible when parallelism is involved in foreach
? How can we use f_print()
in this paralleled foreach
?Any documentations related to this point ?
In addition to what has already been said in the comments of the top post, especially the one on specifying .export
, when using the doFuture package your code will indeed work regardless of parallel backend, operating system, and .export
. Here's an adapted version of your setup:
f_print <- function(x) {
print(x)
}
f_foreach <- function(l) {
foreach(i=l) %do% {
f_print(i)
}
}
f_foreach_dopar <- function(l) {
foreach(i=l) %dopar% {
f_print(i)
}
}
Instead of doing:
library("doParallel")
## Setup PSOCK workers (just as on Windows)
workers <- parallel::makeCluster(1L, outfile = "")
registerDoParallel(workers)
f_foreach_dopar(c(1,2))
## Error in { : task 1 failed - "could not find function "f_print""
you can do:
library("doFuture")
registerDoFuture()
## As above
workers <- parallel::makeCluster(1L, outfile = "")
plan(cluster, workers = workers)
f_foreach_dopar(c(1,2))
## [1] 1
## [1] 2
## [[1]]
## [1] 1
##
## [[2]]
## [1] 2
The reason why this works is that doFuture does a more thorough search to identify global variables (here f_print()
).
PS. The reason for outfile = ""
is so that stdout/stderr output (e.g. as from print()
) is actually displayed. Redirecting stdout/stderr in parallel processing, which I don't recommend, is a whole different discussion, but I'll assume you used print()
just for your example.
这篇关于R函数范围和并行性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!