R使用未导出的函数进行并行处理(以C50为例) [英] R foreach parallel processing with unexported functions (with C50 example)
问题描述
$ library $($)
j = c(1,2)
result = foreach(i = j)%dopar%{
library(C50)
d = iris
model < -C5.0(Species〜,data = d)
modParty< - C50 ::: as.party.C5.0(model)
return(modParty)
}
在这种情况下,它只计算两次模型。在我真正的代码中, d
是一个不断变化的样本,它也在foreach函数中生成。我的调试结果显示其他行是 modParty < - C50 ::: as.party.C5.0(model)
。它抛出错误
$ b
{:任务1失败 - 找不到对象'd'的错误
即使d
确实可用于群集中的每个工作人员,
我通过 logging
包中的 loginfo(ls())
将其检入到一个文件中。
为什么函数没有看到对象 d
?任何帮助非常感谢。
这里的附加信息是 traceback()
> traceback()
3:stop(simpleError(msg,call = expr))
2:e $ fun(obj,substitute(ex),parent.frame(),e $ data)
1:foreach(i = j)%dopar%{
library(C50)
d = iris
model <-C5.0(Species〜。,data = d)$ b $ (modParty)
}
$ b
编辑
只是澄清一点:它不需要用的foreach
。这是与一个正常的函数相同的错误:
pre code $ library $ C
$ bd $ ir $
getC50Party = function(dat){
model < - C5.0(Species〜。,data = dat)
modParty < - C50 ::: as.party.C5 .0(model)
return(modParty)
}
c50Party = getC50Party(d)
$ b
在{:任务1失败 - 找不到对象'dat'的错误
问题在于 as.party.C5.0
尝试从整个工作区访问数据对象。
这是一个错误。除非我们条款 =nofollow>得到错误的情况下。
$ b 尝试从github安装通过
devtools :: install_github( topepo / C5.0 / pkg / C50)
您的示例适用于此版本。
I am trying to extract rules from a C50 model while parallel processing. This answer helped me to extract the rules from the model object. However as I need the models to be processed in parallel, I am using foreach. This seems to have a problem with the not exported function, as it does not see the data object. Here is some reproducible code:
library(foreach)
library(doMC)
registerDoMC(2)
j = c(1,2)
result = foreach(i = j) %dopar% {
library(C50)
d = iris
model <- C5.0(Species ~ ., data = d)
modParty <- C50:::as.party.C5.0(model)
return(modParty)
}
In this case it just calculates the model twice. In my real code d
is a always changing sample which is also generated in the foreach function.
My debugging showed that the miscellaneous line is modParty <- C50:::as.party.C5.0(model)
. It throws the error
Error in { : task 1 failed - "Object 'd' not found"
even if d
is for sure available for each worker in the cluster. I checked that with a log into a file via loginfo(ls())
of the logging
package.
Why does the function not see the object d
? Any help greatly appreciated.
As additional info here is the traceback()
> traceback()
3: stop(simpleError(msg, call = expr))
2: e$fun(obj, substitute(ex), parent.frame(), e$data)
1: foreach(i = j) %dopar% {
library(C50)
d = iris
model <- C5.0(Species ~ ., data = d)
modParty <- C50:::as.party.C5.0(model)
return(modParty)
}
Edit
Just for clarification: it doesn't have to do anything with foreach
. It is the same error with a normal function:
library(C50)
d = iris
getC50Party = function(dat){
model <- C5.0(Species ~ ., data = dat)
modParty <- C50:::as.party.C5.0(model)
return(modParty)
}
c50Party = getC50Party(d)
Error in { : task 1 failed - "Object 'dat' not found"
The problem is that as.party.C5.0
tries to access the data object from the overall workspace.
This is a bug. We do follow Achim's advice and use the terms
object except when we get the case wrong.
Try installing from github via
devtools::install_github("topepo/C5.0/pkg/C50")
Your examples works on this version.
这篇关于R使用未导出的函数进行并行处理(以C50为例)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!