R使用未导出的函数进行并行处理(以C50为例) [英] R foreach parallel processing with unexported functions (with C50 example)

查看:913
本文介绍了R使用未导出的函数进行并行处理(以C50为例)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图从C50模型中提取规则,同时进行并行处理。 这个答案帮助我从模型对象中提取规则。但是,因为我需要并行处理模型,所以我正在使用foreach。这似乎有没有导出的函数,因为它看不到数据对象的问题。这里有一些可重复的代码:

$ library $($)

j = c(1,2)
result = foreach(i = j)%dopar%{
library(C50)
d = iris
model < -C5.0(Species〜,data = d)
modParty< - C50 ::: as.party.C5.0(model)
return(modParty)
}

在这种情况下,它只计算两次模型。在我真正的代码中, d 是一个不断变化的样本,它也在foreach函数中生成。我的调试结果显示其他行是 modParty < - C50 ::: as.party.C5.0(model)。它抛出错误
$ b


{:任务1失败 - 找不到对象'd'的错误
即使 d 确实可用于群集中的每个工作人员,

我通过 logging 包中的 loginfo(ls())将其检入到一个文件中。



为什么函数没有看到对象 d ?任何帮助非常感谢。



这里的附加信息是 traceback()

 > traceback()
3:stop(simpleError(msg,call = expr))
2:e $ fun(obj,substitute(ex),parent.frame(),e $ data)
1:foreach(i = j)%dopar%{
library(C50)
d = iris
model <-C5.0(Species〜。,data = d)$ b $ (modParty)
}


$ b

编辑

只是澄清一点:它不需要用的foreach 。这是与一个正常的函数相同的错误:

pre code $ library $ C
$ bd $ ir $

getC50Party = function(dat){
model < - C5.0(Species〜。,data = dat)
modParty < - C50 ::: as.party.C5 .0(model)
return(modParty)
}

c50Party = getC50Party(d)


$ b


在{:任务1失败 - 找不到对象'dat'的错误

问题在于 as.party.C5.0 尝试从整个工作区访问数据对象。

解决方案

这是一个错误。除非我们条款 =nofollow>得到错误的情况下


$ b 尝试从github安装通过

  devtools :: install_github( topepo / C5.0 / pkg / C50)

您的示例适用于此版本。

I am trying to extract rules from a C50 model while parallel processing. This answer helped me to extract the rules from the model object. However as I need the models to be processed in parallel, I am using foreach. This seems to have a problem with the not exported function, as it does not see the data object. Here is some reproducible code:

library(foreach)
library(doMC)
registerDoMC(2)

j = c(1,2)
result = foreach(i = j) %dopar% {
  library(C50)
  d = iris
  model <- C5.0(Species ~ ., data = d)
  modParty <- C50:::as.party.C5.0(model)
  return(modParty)
}

In this case it just calculates the model twice. In my real code d is a always changing sample which is also generated in the foreach function.

My debugging showed that the miscellaneous line is modParty <- C50:::as.party.C5.0(model). It throws the error

Error in { : task 1 failed - "Object 'd' not found"

even if d is for sure available for each worker in the cluster. I checked that with a log into a file via loginfo(ls()) of the logging package.

Why does the function not see the object d? Any help greatly appreciated.

As additional info here is the traceback()

> traceback()
3: stop(simpleError(msg, call = expr))
2: e$fun(obj, substitute(ex), parent.frame(), e$data)
1: foreach(i = j) %dopar% {
       library(C50)
       d = iris
       model <- C5.0(Species ~ ., data = d)
       modParty <- C50:::as.party.C5.0(model)
       return(modParty)
   }

Edit

Just for clarification: it doesn't have to do anything with foreach. It is the same error with a normal function:

library(C50)

d = iris

getC50Party = function(dat){
  model <- C5.0(Species ~ ., data = dat)
  modParty <- C50:::as.party.C5.0(model)
  return(modParty)
}

c50Party = getC50Party(d)

Error in { : task 1 failed - "Object 'dat' not found"

The problem is that as.party.C5.0 tries to access the data object from the overall workspace.

解决方案

This is a bug. We do follow Achim's advice and use the terms object except when we get the case wrong.

Try installing from github via

devtools::install_github("topepo/C5.0/pkg/C50")

Your examples works on this version.

这篇关于R使用未导出的函数进行并行处理(以C50为例)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆