并行plyr中的奇怪环境行为 [英] Strange environment behavior in parallel plyr

查看:65
本文介绍了并行plyr中的奇怪环境行为的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

最近,我在工作区中创建了一个对象factor=1,却不知道base包中是否包含函数factor.

Recently, I have created an object factor=1 in my workspace, not knowing that there is a function factor in the base package.

我打算做的是在并行循环中使用变量factor,例如

What I intended to do was to use the variable factor within a parallel loop, e.g.,

library(plyr)
library(foreach)
library(doParallel)

workers <- makeCluster(2)
registerDoParallel(workers,cores=2)

factor=1

llply(
  as.list(1:2),
  function(x) factor*x,
  .parallel = TRUE,
  .paropts=list(.export=c("factor"))
     )

但是,这导致一个错误,这使我花了很多时间来理解.看起来,plyr在其环境exportEnv中创建了对象factor,但是使用了base::factor而不是用户提供的对象.参见下面的示例

This, however, results in an error that took me so time to understand. As it seems, plyr creates the object factor in its environemt exportEnv, but uses base::factor instead of the user provided object. See the following example

llply(
  as.list(1:2),
  function(x) {
    function_env=environment();
    global_env=parent.env(function_env);
    export_env=parent.env(global_env);
    list(
      function_env=function_env,
      global_env=global_env,
      export_env=export_env,
      objects_in_exportenv=unlist(ls(envir=export_env)),
      factor_found_in_envs=find("factor"),
      factor_in_exportenv=get("factor",envir=export_env)
      )
    },
  .parallel = TRUE,
  .paropts=list(.export=c("factor"))
  )

stopCluster(workers)

如果检查llply的输出,则会发现factor_in_exportenv=get("factor",envir=export_env)行不返回1(对应于用户提供的对象),而是返回base::factor的函数定义.

If we inspects the output of llply, we see that the line factor_in_exportenv=get("factor",envir=export_env) does not return 1 (corresponding to the user-provided object) but the function definition of base::factor.

问题1)我如何理解这种行为?我本来希望输出是1.

Question 1) How can I understand this behavior? I would have expected the output to be 1.

问题2)如果我为另一个包中已经定义的对象(例如我的factor)分配了新值,是否可以从R得到警告?

Question 2) Is there a way to get a warning from R if I assign a new value to an object that was already defined in another package (such in my case factor)?

推荐答案

首先,我应该注意的是,如果使用的另一个变量名未在base中使用,则错误会消失-例如,如果使用a而不是factor.这清楚地表明llply沿其搜索路径在factor(变量1)上找到base::factor(一个函数).我尝试使用llply的简化版本(即

First, I should note that the error goes away if one uses another variable name that is not used in base -- for instance, if we use a instead of factor. This clearly indicates that llply finds base::factor (a function) before factor (variable with value 1) along its search path. I have tried to replicate this issue with a simplified version of llply, i.e.,

library(plyr)
library(foreach)
library(doParallel)

workers <- makeCluster(2)
registerDoParallel(workers,cores=2)

factor=1

llply_simple=function(.x,.fun,.paropts) {
  #give current environment a name
  tmpEnv=environment()
  attr(tmpEnv,"name")="llply_simple_body"
  #print all enclosing envirs of llply_simple_body (see def of allEnv below)
  print(allEnv(tmpEnv))
  cat("------\nResults:\n")
  do.ply=function(i) {
    .fun(i)
  }
  fe_call <- as.call(c(list(quote(foreach::foreach), i = .x), .paropts))
  fe <- eval(fe_call)
  foreach::`%dopar%`(fe, do.ply(i))
}

llply_simple使用循环遍历所有封闭环境的递归辅助函数(allEnv).它返回一个带有所有环境名称的向量

llply_simple uses a recursive helper function (allEnv) that loops through all enclosing environments. It returns a vector with all environment names

allEnv=function(x) {
  if (environmentName(x)=="R_EmptyEnv") {
    return(environmentName(x))
  } else {
    c(environmentName(x),allEnv(parent.env(x)))
  }
}

有趣的是,简化功能实际上按预期方式工作(即,给出12作为结果)

It's interesting that the simplified function actually works as expected (i.e., gives 1 and 2 as results)

llply_simple(1:2,function(x) x*factor,list(.export="factor"))
#[1] "llply_simple_body"  "R_GlobalEnv"        "package:doParallel" "package:parallel"  
#[5] "package:iterators"  "package:foreach"    "package:plyr"       "tools:rstudio"     
#[9] "package:stats"      "package:graphics"   "package:grDevices"  "package:utils"     
#[13] "package:datasets"   "package:methods"    "Autoloads"          "base"              
#[17] "R_EmptyEnv"
#--------
#Results:        
#[[1]]
#[1] 1
#
#[[2]]
#[1] 2

因此,相对于完整的plyr::llply函数,llply_simple的唯一重要区别是后者属于一个程序包.让我们尝试将llply_simple移入程序包.

So the only significant difference of llply_simple with respect to the full plyr::llply function is that the latter belongs to a package. Let's try to move llply_simple into a package.

package.skeleton(list=c("llply_simple","allEnv"),name="llplyTest")
unlink("./llplyTest/DESCRIPTION")
devtools::create_description("./llplyTest",
                             extra=list("devtools.desc.author"='"T <t@t.com>"'))
tmp=readLines("./llplyTest/man/llply_simple.Rd")
tmp[which(grepl("\\\\title",tmp))+1]="Test1"
writeLines(tmp,"./llplyTest/man/llply_simple.Rd")
tmp=readLines("./llplyTest/man/allEnv.Rd")
tmp[which(grepl("\\\\title",tmp))+1]="Test2"
writeLines(tmp,"./llplyTest/man/allEnv.Rd")
devtools::install("./llplyTest")

现在尝试从我们的新软件包llplyTest

And now try to execute llplyTest::llply_simple from our new package llplyTest

library(llplyTest)
llplyTest::llply_simple(1:2,function(x) x*factor,list(.export="factor"))
#[1] "llply_simple_body"  "llplyTest"          "imports:llplyTest"  "base"              
#[5] "R_GlobalEnv"        "package:doParallel" "package:parallel"   "package:iterators" 
#[9] "package:foreach"    "package:plyr"       "tools:rstudio"      "package:stats"     
#[13] "package:graphics"   "package:grDevices"  "package:utils"      "package:datasets"  
#[17] "package:methods"    "Autoloads"          "base"               "R_EmptyEnv"
#------
#Results:
#Error in do.ply(i) : 
#  task 1 failed - "non-numeric argument to binary operator"

突然之间,我们收到了与我在2013年提出的第一个问题相同的错误.因此,问题显然与从包中调用函数有关.让我们看一下allEnv的输出:它基本上为我们提供了llpy_simplellplyTest::llpy_simple用于查找应导出的变量的环境序列.实际上是由foreach进行导出的,如果有兴趣了解为什么foreach真正从我们命名为llply_simple_body的环境开始,请查看foreach::%dopar%foreach:::getDoParforeach:::.foreachGlobals$fun的源代码,并遵循envir参数的路径.

All of a sudden we get the same error as in my original question from 2013. So the issue is clearly connected to calling the function from a package. Let's have a look at the output of allEnv: it basically gives us the sequence of environments that llpy_simple and llplyTest::llpy_simple use to look for variables that should get exported. Actually it's foreach that does the exporting and if one is interested to see why foreach really starts with the environment that we named llply_simple_body, look at the source code of foreach::%dopar%, foreach:::getDoPar and foreach:::.foreachGlobals$fun and follow the path of the envir argument.

我们现在可以清楚地看到非软件包版本的搜索顺序与llplyTest::llpy_simple不同,并且软件包版本将首先在base中找到factor

We can now clearly see that the non-package version has a different search sequence than llplyTest::llpy_simple and that the package-version will find factor in base first!

这篇关于并行plyr中的奇怪环境行为的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆