R函数范围和并行性 [英] R function scope and parallelism

查看:354
本文介绍了R函数范围和并行性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

考虑以下函数定义:

  library(doParallel)
f_print< - function(x)

print(x)
}
f_foreach< - function(l)
{
foreach(i = 1)%do%
{
f_print(i)
}
}

f_foreach_parallel < - function(l)
{
doParallel :: registerDoParallel(1 )
foreach(i = 1)%dopar%
{
f_print(i)
}
}
pre>

函数用法:

 > f_foreach(c(1,2))
[1] 1
[1] 2
[[1]]
[1] 1

[[2]]
[1] 2

> f_foreach_parallel(c(1,2))
显示Traceback

使用Debug重新运行
{:
任务1失败 - 不可能的事情中的错误f_print
[错误:找不到函数f_print]
>

你能解释为什么 f_print() foreach 中涉及并行时不可见?我们如何在这个并行的 foreach 中使用 f_print()?与这一点有关的任何文档?

解决方案

除了顶级帖子的评论中已经说过的内容,尤其是指定 .export ,当使用 doFuture 软件包时,您的代码确实可以工作并行后端,操作系统和 .export 。以下是您的设置的改编版本:

  f_print < -  function(x){
print(x)

$ b $ f_foreach< - function(l){
foreach(i = 1)%do%{
f_print(i)
}

$ b $ f_foreach_dopar< - function(l){
foreach(i = 1)%dopar%{
f_print(i)
}
}

而不是:

<$ p
$ worker < - parallel :: makeCluster(1L,outfile)
$ =)
registerDoParallel(workers)

f_foreach_dopar(c(1,2))
## {:任务1失败的错误 - 找不到函数f_print

您可以这样做:

<$ p

$如上
workers < - parallel :: makeCluster(1L,outfile) library(doFuture)
registerDoFuture =)
plan(cluster,workers = workers)

f_foreach_dopar(c(1,2))
## [1] 1
## [ 1] 2
# #[[1]]
## [1] 1
##
## [[2]]
## [1] 2

这个原因是因为doFuture做了一个更全面的搜索来识别全局变量(这里 f_print() )。



PS。 outfile =的原因是stdout / stderr输出(例如从 print())实际上是显示。在并行处理中重定向stdout / stderr(我不建议这么做)是一个完全不同的讨论,但我会假设您仅使用 print()来举例。 / p>

Consider the following function definitions

library(doParallel)
f_print <- function(x)
{
  print(x)
}
f_foreach <- function(l)
{
  foreach (i=l) %do%
  {
    f_print(i)
  }
}

f_foreach_parallel <- function(l)
{
  doParallel::registerDoParallel(1)
  foreach (i=l) %dopar%
  {
    f_print(i)
  }
}

Function use :

> f_foreach(c(1,2))
[1] 1
[1] 2
[[1]]
[1] 1

[[2]]
[1] 2

> f_foreach_parallel(c(1,2))
 Show Traceback

 Rerun with Debug
 Error in { : 
  task 1 failed - "impossible de trouver la fonction "f_print"" 
  [Error: could not find function f_print]
> 

Can you help explain why the f_print() is not visible when parallelism is involved in foreach ? How can we use f_print() in this paralleled foreach ?Any documentations related to this point ?

解决方案

In addition to what has already been said in the comments of the top post, especially the one on specifying .export, when using the doFuture package your code will indeed work regardless of parallel backend, operating system, and .export. Here's an adapted version of your setup:

f_print <- function(x) {
  print(x)
}

f_foreach <- function(l) {
  foreach(i=l) %do% {
    f_print(i)
  }
}

f_foreach_dopar <- function(l) {
  foreach(i=l) %dopar% {
    f_print(i)
  }
}

Instead of doing:

library("doParallel")

## Setup PSOCK workers (just as on Windows)
workers <- parallel::makeCluster(1L, outfile = "")
registerDoParallel(workers)

f_foreach_dopar(c(1,2))
## Error in { : task 1 failed - "could not find function "f_print""

you can do:

library("doFuture")
registerDoFuture()

## As above
workers <- parallel::makeCluster(1L, outfile = "")
plan(cluster, workers = workers)

f_foreach_dopar(c(1,2))
## [1] 1
## [1] 2
## [[1]]
## [1] 1
## 
## [[2]]
## [1] 2

The reason why this works is that doFuture does a more thorough search to identify global variables (here f_print()).

PS. The reason for outfile = "" is so that stdout/stderr output (e.g. as from print()) is actually displayed. Redirecting stdout/stderr in parallel processing, which I don't recommend, is a whole different discussion, but I'll assume you used print() just for your example.

这篇关于R函数范围和并行性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆