并行plyr中的奇怪环境行为 [英] Strange environment behavior in parallel plyr
问题描述
最近,我在工作区中创建了一个对象factor=1
,却不知道base
包中是否包含函数factor
.
Recently, I have created an object factor=1
in my workspace, not knowing that there is a function factor
in the base
package.
我打算做的是在并行循环中使用变量factor
,例如
What I intended to do was to use the variable factor
within a parallel loop, e.g.,
library(plyr)
library(foreach)
library(doParallel)
workers <- makeCluster(2)
registerDoParallel(workers,cores=2)
factor=1
llply(
as.list(1:2),
function(x) factor*x,
.parallel = TRUE,
.paropts=list(.export=c("factor"))
)
但是,这导致一个错误,这使我花了很多时间来理解.看起来,plyr
在其环境exportEnv
中创建了对象factor
,但是使用了base::factor
而不是用户提供的对象.参见下面的示例
This, however, results in an error that took me so time to understand. As it seems, plyr
creates the object factor
in its environemt exportEnv
, but uses base::factor
instead of the user provided object. See the following example
llply(
as.list(1:2),
function(x) {
function_env=environment();
global_env=parent.env(function_env);
export_env=parent.env(global_env);
list(
function_env=function_env,
global_env=global_env,
export_env=export_env,
objects_in_exportenv=unlist(ls(envir=export_env)),
factor_found_in_envs=find("factor"),
factor_in_exportenv=get("factor",envir=export_env)
)
},
.parallel = TRUE,
.paropts=list(.export=c("factor"))
)
stopCluster(workers)
如果检查llply
的输出,则会发现factor_in_exportenv=get("factor",envir=export_env)
行不返回1
(对应于用户提供的对象),而是返回base::factor
的函数定义.
If we inspects the output of llply
, we see that the line factor_in_exportenv=get("factor",envir=export_env)
does not return 1
(corresponding to the user-provided object) but the function definition of base::factor
.
问题1)我如何理解这种行为?我本来希望输出是1
.
Question 1) How can I understand this behavior? I would have expected the output to be 1
.
问题2)如果我为另一个包中已经定义的对象(例如我的factor
)分配了新值,是否可以从R
得到警告?
Question 2) Is there a way to get a warning from R
if I assign a new value to an object that was already defined in another package (such in my case factor
)?
推荐答案
首先,我应该注意的是,如果使用的另一个变量名未在base
中使用,则错误会消失-例如,如果使用a
而不是factor
.这清楚地表明llply
沿其搜索路径在factor
(变量1)上找到base::factor
(一个函数).我尝试使用llply
的简化版本(即
First, I should note that the error goes away if one uses another variable name that is not used in base
-- for instance, if we use a
instead of factor
. This clearly indicates that llply
finds base::factor
(a function) before factor
(variable with value 1) along its search path. I have tried to replicate this issue with a simplified version of llply
, i.e.,
library(plyr)
library(foreach)
library(doParallel)
workers <- makeCluster(2)
registerDoParallel(workers,cores=2)
factor=1
llply_simple=function(.x,.fun,.paropts) {
#give current environment a name
tmpEnv=environment()
attr(tmpEnv,"name")="llply_simple_body"
#print all enclosing envirs of llply_simple_body (see def of allEnv below)
print(allEnv(tmpEnv))
cat("------\nResults:\n")
do.ply=function(i) {
.fun(i)
}
fe_call <- as.call(c(list(quote(foreach::foreach), i = .x), .paropts))
fe <- eval(fe_call)
foreach::`%dopar%`(fe, do.ply(i))
}
llply_simple
使用循环遍历所有封闭环境的递归辅助函数(allEnv
).它返回一个带有所有环境名称的向量
llply_simple
uses a recursive helper function (allEnv
) that loops through all enclosing environments. It returns a vector with all environment names
allEnv=function(x) {
if (environmentName(x)=="R_EmptyEnv") {
return(environmentName(x))
} else {
c(environmentName(x),allEnv(parent.env(x)))
}
}
有趣的是,简化功能实际上按预期方式工作(即,给出1
和2
作为结果)
It's interesting that the simplified function actually works as expected (i.e., gives 1
and 2
as results)
llply_simple(1:2,function(x) x*factor,list(.export="factor"))
#[1] "llply_simple_body" "R_GlobalEnv" "package:doParallel" "package:parallel"
#[5] "package:iterators" "package:foreach" "package:plyr" "tools:rstudio"
#[9] "package:stats" "package:graphics" "package:grDevices" "package:utils"
#[13] "package:datasets" "package:methods" "Autoloads" "base"
#[17] "R_EmptyEnv"
#--------
#Results:
#[[1]]
#[1] 1
#
#[[2]]
#[1] 2
因此,相对于完整的plyr::llply
函数,llply_simple
的唯一重要区别是后者属于一个程序包.让我们尝试将llply_simple
移入程序包.
So the only significant difference of llply_simple
with respect to the full plyr::llply
function is that the latter belongs to a package. Let's try to move llply_simple
into a package.
package.skeleton(list=c("llply_simple","allEnv"),name="llplyTest")
unlink("./llplyTest/DESCRIPTION")
devtools::create_description("./llplyTest",
extra=list("devtools.desc.author"='"T <t@t.com>"'))
tmp=readLines("./llplyTest/man/llply_simple.Rd")
tmp[which(grepl("\\\\title",tmp))+1]="Test1"
writeLines(tmp,"./llplyTest/man/llply_simple.Rd")
tmp=readLines("./llplyTest/man/allEnv.Rd")
tmp[which(grepl("\\\\title",tmp))+1]="Test2"
writeLines(tmp,"./llplyTest/man/allEnv.Rd")
devtools::install("./llplyTest")
现在尝试从我们的新软件包llplyTest
And now try to execute llplyTest::llply_simple
from our new package llplyTest
library(llplyTest)
llplyTest::llply_simple(1:2,function(x) x*factor,list(.export="factor"))
#[1] "llply_simple_body" "llplyTest" "imports:llplyTest" "base"
#[5] "R_GlobalEnv" "package:doParallel" "package:parallel" "package:iterators"
#[9] "package:foreach" "package:plyr" "tools:rstudio" "package:stats"
#[13] "package:graphics" "package:grDevices" "package:utils" "package:datasets"
#[17] "package:methods" "Autoloads" "base" "R_EmptyEnv"
#------
#Results:
#Error in do.ply(i) :
# task 1 failed - "non-numeric argument to binary operator"
突然之间,我们收到了与我在2013年提出的第一个问题相同的错误.因此,问题显然与从包中调用函数有关.让我们看一下allEnv
的输出:它基本上为我们提供了llpy_simple
和llplyTest::llpy_simple
用于查找应导出的变量的环境序列.实际上是由foreach
进行导出的,如果有兴趣了解为什么foreach
真正从我们命名为llply_simple_body
的环境开始,请查看foreach::%dopar%
,foreach:::getDoPar
和foreach:::.foreachGlobals$fun
的源代码,并遵循envir
参数的路径.
All of a sudden we get the same error as in my original question from 2013. So the issue is clearly connected to calling the function from a package. Let's have a look at the output of allEnv
: it basically gives us the sequence of environments that llpy_simple
and llplyTest::llpy_simple
use to look for variables that should get exported. Actually it's foreach
that does the exporting and if one is interested to see why foreach
really starts with the environment that we named llply_simple_body
, look at the source code of foreach::%dopar%
, foreach:::getDoPar
and foreach:::.foreachGlobals$fun
and follow the path of the envir
argument.
我们现在可以清楚地看到非软件包版本的搜索顺序与llplyTest::llpy_simple
不同,并且软件包版本将首先在base
中找到factor
!
We can now clearly see that the non-package version has a different search sequence than llplyTest::llpy_simple
and that the package-version will find factor
in base
first!
这篇关于并行plyr中的奇怪环境行为的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!