doParallel“foreach"不一致地从父环境继承对象:“{中的错误:任务 1 失败-“找不到函数..."; [英] doParallel "foreach" inconsistently inherits objects from parent environment: "Error in { : task 1 failed - "could not find function..."

查看:30
本文介绍了doParallel“foreach"不一致地从父环境继承对象:“{中的错误:任务 1 失败-“找不到函数...";的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在使用 foreach 时遇到了一个我无法解决的问题.以下代码在我尝试过的两台 Windows 计算机上失败,但在三台运行相同版本的 R 和 doParallel 的 Linux 计算机上成功:

library("doParallel")registerDoParallel(cl=2,cores=2)f <- function(){return(10)}g <- 函数(){r = foreach(x = 1:4) %dopar% {返回(x + f())}返回(r)}G()

在这两台Windows电脑上,返回如下错误:

 {中的错误:任务 1 失败 - 找不到函数f""

然而,这在 Linux 计算机上工作得很好,也适用于 %do% 而不是 %dopar%,并且适用于常规 for 循环.

变量也是如此,例如设置 i <- 10 并将 return(x + f()) 替换为 return(x + i)

对于其他有同样问题的人,有两种解决方法:

1) 使用 .export 显式导入所需的函数和变量:

r = foreach(x=1:4, .export="f") %dopar%

2) 导入所有全局对象:

r = foreach(x=1:4, .export=ls(.GlobalEnv)) %dopar%

这些变通方法的问题在于,对于一​​个大型的、积极开发的软件包来说,它们并不是最稳定的.无论如何,foreach 应该表现得像 for.

是否知道是什么导致了这种情况以及是否有解决办法?

<小时>

运行该功能的计算机的版本信息:

R 版本 3.2.2 (2015-08-14)平台:x86_64-pc-linux-gnu(64 位)运行于:CentOS 6.5 版(最终版)其他附加包:[1] doParallel_1.0.10 iterators_1.0.8 foreach_1.4.3

无法使用该功能的计算机:

R 版本 3.2.2 (2015-08-14)平台:x86_64-w64-mingw32/x64(64位)运行于:Windows 7 x64 (build 7601) Service Pack 1其他附加包:[1] doParallel_1.0.10 iterators_1.0.8 foreach_1.4.3

解决方案

@Tensibai 是对的.在 Windows 上尝试使用 doParallel 时,您必须导出"您想要使用的不在当前范围内的函数.根据我的经验,我完成这项工作的方式是使用以下(已编辑)示例.

format_number <- 函数(数据){# 做一些需要 stringr 的事情}format_date_time <- 函数(数据){# 做一些需要 stringr 的事情}add_direction_data <- 函数(数据){# 做需要 dplyr 的事情}parse_data <- 函数(数据){voice_start <- # 值向量voice_end <- # 值向量target_phone_numbers <- # 值向量parse_voice_block <- function(block_start, block_end, number) {# 做东西}number_of_cores <- parallel::detectCores() - 1集群 <- parallel::makeCluster(number_of_cores)doParallel::registerDoParallel(集群)data_list <- foreach(i = 1:length(voice_start), .combine=list,.multicombine=真,.export = c("format_number", "format_date_time", "add_direction_data"),.packages = c("dplyr", "stringr")) %dopar%parse_voice_block(voice_start[i], voice_end[i], target_phone_numbers[i])doParallel::stopCluster(集群)输出 <- plyr::rbind.fill(data_list)}

由于前三个函数不包含在我当前的环境中,doParallel 在启动 R 的新实例时会忽略它们,但它会知道在哪里可以找到 parse_voice_block 因为它在当前范围内.此外,您需要指定应该在每个新的 R 实例中加载哪些包.正如 Tensibai 所说,这是因为您没有运行分叉进程,而是同时启动多个 R 实例并同时运行命令.

I have a problem with foreach that I just can't figure out. The following code fails on two Windows computers I've tried, but succeeds on three Linux computers, all running the same versions of R and doParallel:

library("doParallel")
registerDoParallel(cl=2,cores=2)

f <- function(){return(10)}
g <- function(){
    r = foreach(x = 1:4) %dopar% {
        return(x + f())
    }
    return(r)
}
g()

On these two Windows computers, the following error is returned:

Error in { : task 1 failed - "could not find function "f""

However, this works just fine on the Linux computers, and also works just fine with %do% instead of %dopar%, and works fine for a regular for loop.

The same is true with variables, e.g. setting i <- 10 and replacing return(x + f()) with return(x + i)

For others with the same problem, two workarounds are:

1) explicitly import the needed functions and variables with .export:

r = foreach(x=1:4, .export="f") %dopar% 

2) import all global objects:

r = foreach(x=1:4, .export=ls(.GlobalEnv)) %dopar% 

The problem with these workarounds is that they aren't the most stable for a big, actively developing package. In any case, foreach is supposed to behave like for.

Any ideas of what's causing this and if there's a fix?


Version info of the computer that the function works on:

R version 3.2.2 (2015-08-14)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS release 6.5 (Final)

other attached packages:
[1] doParallel_1.0.10 iterators_1.0.8   foreach_1.4.3

The computer the function doesn't work on:

R version 3.2.2 (2015-08-14)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

other attached packages:
[1] doParallel_1.0.10 iterators_1.0.8   foreach_1.4.3  

解决方案

@Tensibai is right. When trying to use doParallel on Windows, you have to "export" the functions that you want to use that are not in the current scope. In my experience, the way I've made this work is with the following (redacted) example.

format_number <- function(data) {
  # do stuff that requires stringr
}

format_date_time <- function(data) {
  # do stuff that requires stringr
}

add_direction_data <- function(data) {
  # do stuff that requires dplyr
}

parse_data <- function(data) {
  voice_start <- # vector of values
  voice_end <- # vector of values
  target_phone_numbers <- # vector of values
  parse_voice_block <- function(block_start, block_end, number) {
    # do stuff
  }

  number_of_cores <- parallel::detectCores() - 1
  clusters <- parallel::makeCluster(number_of_cores)
  doParallel::registerDoParallel(clusters)
  data_list <- foreach(i = 1:length(voice_start), .combine=list,
                       .multicombine=TRUE, 
                       .export = c("format_number", "format_date_time", "add_direction_data"), 
                       .packages = c("dplyr", "stringr")) %dopar% 
                       parse_voice_block(voice_start[i], voice_end[i], target_phone_numbers[i])
  doParallel::stopCluster(clusters)
  output <- plyr::rbind.fill(data_list)
}

Since the first three functions aren't included in my current environment, doParallel would ignore them when firing up the new instances of R, but it would know where to find parse_voice_block since it's within the current scope. In addition, you need to specify what packages should be loaded in each new instance of R. As Tensibai stated, this is because you're not running forking the process, but instead firing up multiple instances of R and running commands simultaneously.

这篇关于doParallel“foreach"不一致地从父环境继承对象:“{中的错误:任务 1 失败-“找不到函数...";的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆