插入符上的parRF不能为多个核心工作 [英] parRF on caret not working for more than one core

查看:65
本文介绍了插入符上的parRF不能为多个核心工作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

parRF不适用于多个内核,这颇具讽刺意味,因为parRF中的 par 代表并行.如果这是一条相关的信息,我在Windows机器上.我检查过我是否使用了有关插入符号和doParallel的最新版本.

parRF from the caret R package is not working for me with more than one core, which is quite ironic, given the par in parRF stands for parallel. I'm on a windows machine, if that is a relevant piece of information. I checked that I'm using the latest an greatest regarding caret and doParallel.

我做了一个最小的例子,并在下面给出结果.有什么想法吗?

I made a minimal example and and give the results below. Any ideas?

源代码

library(caret)
library(doParallel)

trCtrl <- trainControl(
  method = "repeatedcv"
  , number = 2
  , repeats = 5
  , allowParallel = TRUE
)

# WORKS
registerDoParallel(1)
train(form = Species~., data=iris, trControl = trCtrl, method="parRF")
closeAllConnections()

# FAILS
registerDoParallel(2)
train(form = Species~., data=iris, trControl = trCtrl, method="parRF")
closeAllConnections()

输出

> library(caret)
> library(doParallel)
> 
> trCtrl <- trainControl(
+   method = "repeatedcv"
+   , number = 2
+   , repeats = 5
+   , allowParallel = TRUE
+ )
> 
> 
> # WORKS
> registerDoParallel(1)
> train(form = Species~., data=iris, trControl = trCtrl, method="parRF")
Parallel Random Forest 

150 samples
  4 predictors
  3 classes: 'setosa', 'versicolor', 'virginica' 

... some more model output, works fine!
> closeAllConnections()
> 
> # FAILS
> registerDoParallel(2)
> train(form = Species~., data=iris, trControl = trCtrl, method="parRF")
Error in train.default(x, y, weights = w, ...) : 
  final tuning parameters could not be determined
In addition: Warning messages:
1: In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo,  :
  There were missing values in resampled performance measures.
2: In train.default(x, y, weights = w, ...) :
  missing values found in aggregated results
> closeAllConnections()

会话信息

> sessionInfo()
R version 3.1.0 (2014-04-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=German_Germany.1252  LC_CTYPE=German_Germany.1252    LC_MONETARY=German_Germany.1252 LC_NUMERIC=C                   
[5] LC_TIME=German_Germany.1252    

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] doParallel_1.0.8   iterators_1.0.7    foreach_1.4.2      e1071_1.6-3        randomForest_4.6-7 caret_6.0-30       ggplot2_1.0.0     
[8] lattice_0.20-29   

loaded via a namespace (and not attached):
 [1] BradleyTerry2_1.0-4 brglm_0.5-9         car_2.0-20          class_7.3-10        codetools_0.2-8     colorspace_1.2-4   
 [7] compiler_3.1.0      digest_0.6.4        gnm_1.0-7           grid_3.1.0          gtable_0.1.2        gtools_3.4.1       
[13] lme4_1.1-6          MASS_7.3-31         Matrix_1.1-3        minqa_1.2.3         munsell_0.4.2       nlme_3.1-117       
[19] nnet_7.3-8          plyr_1.8.1          proto_0.3-10        qvcalc_0.8-8        Rcpp_0.11.2         RcppEigen_0.3.2.1.2
[25] relimp_1.0-3        reshape2_1.4        scales_0.2.4        splines_3.1.0       stringr_0.6.2       tcltk_3.1.0        
[31] tools_3.1.0   

更新

  • 与3.1.1(相同的软件包版本)进行了尝试,结果相同.
  • 在3.0.2和较旧版本的插入符号doParallel中进行了尝试,它可以正常工作(请参阅会议信息)

会话信息2:

R version 3.0.2 (2013-09-25)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=German_Germany.1252  LC_CTYPE=German_Germany.1252    LC_MONETARY=German_Germany.1252
[4] LC_NUMERIC=C                    LC_TIME=German_Germany.1252    

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] e1071_1.6-1        class_7.3-9        randomForest_4.6-7 doParallel_1.0.6   iterators_1.0.6   
 [6] caret_5.17-7       reshape2_1.2.2     plyr_1.8           lattice_0.20-24    foreach_1.4.1     
[11] cluster_1.14.4    

loaded via a namespace (and not attached):
[1] codetools_0.2-8 compiler_3.0.2  grid_3.0.2      stringr_0.6.2   tools_3.0.2    

推荐答案

这显然是插入符号6.0-30中的错误,该错误是在5.17-7版本之后的某个时间引入的.这也是Windows用户遇到的另一个问题,因为doParallel"mclapply模式"有效,而"clusterApplyLB模式"失败.

This is clearly a bug in caret 6.0-30 that was introduced sometime after version 5.17-7. It's also another problem that is more likely to hit Windows users, since the doParallel "mclapply mode" works, while the "clusterApplyLB mode" fails.

我已经进行了一些测试,看来问题是由于未正确初始化集群工作程序以执行嵌套并行计算,因此您可以通过在集群工作程序中加载foreach程序包之前解决该错误,称为火车".为此,您需要显式创建群集对象,而不是让"registerDoParallel"函数为您创建该对象(在Windows上也是如此).例如:

I've run some tests, and it appears that the problem is due to the cluster workers not being properly initialized to perform nested parallel computations, so you can work-around the bug by loading the foreach package in the cluster workers before calling "train". To do this, you need to explicitly create the cluster object, rather than letting the "registerDoParallel" function create it for you (which it does on Windows). For example:

cl <- makePSOCKcluster(2)
clusterEvalQ(cl, library(foreach))
registerDoParallel(cl)

我将联系插入符号的作者,讨论该问题的解决方案.

I'll contact the author of caret to discuss a solution to the problem.

这篇关于插入符上的parRF不能为多个核心工作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆