doParallel“foreach”不一致地继承父环境中的对象:“{:任务1失败 - ”找不到函数...“中的错误。 [英] doParallel "foreach" inconsistently inherits objects from parent environment: "Error in { : task 1 failed - "could not find function..."

查看:387
本文介绍了doParallel“foreach”不一致地继承父环境中的对象:“{:任务1失败 - ”找不到函数...“中的错误。的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个foreach的问题,我无法弄清楚。以下代码在我尝试过的两台Windows计算机上失败,但在三台运行相同版本的R和doParallel的Linux计算机上成功:

  library(doParallel)
registerDoParallel(cl = 2,cores = 2)

f< - function(){return(10)}
g< - 函数(){
r = foreach(x = 1:4)%dopar%{
return(x + f())
}
return(r)$ b $在这两台Windows计算机上,返回以下错误:g()

/ p>

  {:task 1 failed  - 找不到函数f

然而,这在Linux电脑上工作得很好,而且在%do%而不是%dopar%下工作得很好,正常for循环。

变量也是如此,例如设置 i < - 10 并将返回(x + f())替换为 return (x + i)



对于有同样问题的其他人,有两个解决方法是:

  r = foreach(x = 1:4, .export =f)%dopar%

2)导入所有全局对象:


$ b $

  r = foreach(x = 1:4,.export = ls(.GlobalEnv))%dopar%


$ p

这些解决方法的问题在于它们对于一个大的,积极开发的软件包来说并不是最稳定的。在任何情况下,foreach应该表现得像。



任何想法是什么导致这种情况,如果有修复?





该函数运行的计算机的版本信息:

  R版本3.2.2(2015-08-14)
平台:x86_64-pc-linux-gnu(64位)
运行于:CentOS版本6.5(最终版)

其他附加软件包:
[1] doParallel_1.0.10 iterators_1.0.8 foreach_1.4.3



  R版本3.2.2(2015-08-14)
平台:x86_64-w64-mingw32 / x64(64位)
运行于:Windows 7 x64(内部版本7601)Service Pack 1

其他附加软件包:
[1] doParallel_1.0.10 iterators_1.0.8 foreach_1.4.3


解决方案

@Tensibai是对的。在Windows上尝试使用 doParallel 时,必须导出不在当前范围内的函数。根据我的经验,我做这个工作的方式是用下面的(redacted)例子。

  format_number<  - 函数(data){
#做的东西,需要stringr
}

format_date_time< - 函数(数据){
#做的东西,需要stringr
}
$ b add_direction_data< - 函数(数据){
#做要求dplyr
}

parse_data< - 函数(数据){
voice_start< - #向量值
voice_end< - #值向量
target_phone_numbers< - #向量值
parse_voice_block< - 函数(block_start,block_end, (并行):: detectCores() - 1
集群< - parallel :: makeCluster(number_of_cores)
doParallel :: registerDoParallel(clusters)
data_list< - foreach(i = 1:length(voice_start),.combine = list,
.multicombine = TRUE,
.export = c(format_number,format_date_time,add_direction_data),
.packages = c(dplyr,stringr))%dopar%
parse_voice_block(voice_start [i], voice_end [i],target_phone_numbers [i])
doParallel :: stopCluster(clusters)
output <-plyr :: rbind.fill(data_list)
}
doParallel
$ b>

>在启动R的新实例时会忽略它们,但它会知道在哪里找到 parse_voice_block ,因为它在当前范围内。另外,你需要指定在R的每个新实例中应该加载哪些包。正如Tensibai所说,这是因为你没有运行分叉进程,而是同时启动R的多个实例并同时运行命令。 p>

I have a problem with foreach that I just can't figure out. The following code fails on two Windows computers I've tried, but succeeds on three Linux computers, all running the same versions of R and doParallel:

library("doParallel")
registerDoParallel(cl=2,cores=2)

f <- function(){return(10)}
g <- function(){
    r = foreach(x = 1:4) %dopar% {
        return(x + f())
    }
    return(r)
}
g()

On these two Windows computers, the following error is returned:

Error in { : task 1 failed - "could not find function "f""

However, this works just fine on the Linux computers, and also works just fine with %do% instead of %dopar%, and works fine for a regular for loop.

The same is true with variables, e.g. setting i <- 10 and replacing return(x + f()) with return(x + i)

For others with the same problem, two workarounds are:

1) explicitly import the needed functions and variables with .export:

r = foreach(x=1:4, .export="f") %dopar% 

2) import all global objects:

r = foreach(x=1:4, .export=ls(.GlobalEnv)) %dopar% 

The problem with these workarounds is that they aren't the most stable for a big, actively developing package. In any case, foreach is supposed to behave like for.

Any ideas of what's causing this and if there's a fix?


Version info of the computer that the function works on:

R version 3.2.2 (2015-08-14)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS release 6.5 (Final)

other attached packages:
[1] doParallel_1.0.10 iterators_1.0.8   foreach_1.4.3

The computer the function doesn't work on:

R version 3.2.2 (2015-08-14)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

other attached packages:
[1] doParallel_1.0.10 iterators_1.0.8   foreach_1.4.3  

解决方案

@Tensibai is right. When trying to use doParallel on Windows, you have to "export" the functions that you want to use that are not in the current scope. In my experience, the way I've made this work is with the following (redacted) example.

format_number <- function(data) {
  # do stuff that requires stringr
}

format_date_time <- function(data) {
  # do stuff that requires stringr
}

add_direction_data <- function(data) {
  # do stuff that requires dplyr
}

parse_data <- function(data) {
  voice_start <- # vector of values
  voice_end <- # vector of values
  target_phone_numbers <- # vector of values
  parse_voice_block <- function(block_start, block_end, number) {
    # do stuff
  }

  number_of_cores <- parallel::detectCores() - 1
  clusters <- parallel::makeCluster(number_of_cores)
  doParallel::registerDoParallel(clusters)
  data_list <- foreach(i = 1:length(voice_start), .combine=list,
                       .multicombine=TRUE, 
                       .export = c("format_number", "format_date_time", "add_direction_data"), 
                       .packages = c("dplyr", "stringr")) %dopar% 
                       parse_voice_block(voice_start[i], voice_end[i], target_phone_numbers[i])
  doParallel::stopCluster(clusters)
  output <- plyr::rbind.fill(data_list)
}

Since the first three functions aren't included in my current environment, doParallel would ignore them when firing up the new instances of R, but it would know where to find parse_voice_block since it's within the current scope. In addition, you need to specify what packages should be loaded in each new instance of R. As Tensibai stated, this is because you're not running forking the process, but instead firing up multiple instances of R and running commands simultaneously.

这篇关于doParallel“foreach”不一致地继承父环境中的对象:“{:任务1失败 - ”找不到函数...“中的错误。的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆