Caret - 在 gafsControl() 中设置种子 [英] Caret - Setting the seeds inside the gafsControl()

查看:72
本文介绍了Caret - 在 gafsControl() 中设置种子的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在插入符号的 gafsControl() 中设置 seeds,但出现此错误:

I am trying to set the seeds inside the caret's gafsControl(), but I am getting this error:

Error in { : task 1 failed - "supplied seed is not a valid integer"

我知道 trainControl()seeds 是一个向量,等于重采样数加 1,模型调整参数的组合数(在我的例子中)36,每个(重新采样)条目中的 SVM,具有 6 个西格玛和 6 个成本值).但是,我不知道应该为 gafsControl() 使用什么.我试过 iters*popSize (100*10), iters (100), popSize (10),但都没有奏效.

I understand that seeds for trainControl() is a vector equal to the number of resamples plus one, with the number of combinations of models's tuning parameters (in my case 36, SVM with 6 Sigma and 6 Cost values) in each (resamples) entries. However, I couldn't figure out what I should use for gafsControl(). I've tried iters*popSize (100*10), iters (100), popSize (10), but none has worked.

提前致谢.

这是我的代码(带有模拟数据):

here is my code (with simulated data):

library(caret)
library(doMC)
library(kernlab)

registerDoMC(cores=32)

set.seed(1234)
train.set <- twoClassSim(300, noiseVars = 100, corrVar = 100, corrValue = 0.75)

mylogGA <- caretGA
mylogGA$fitness_extern <- mnLogLoss

#Index for gafsControl
set.seed(1045481)
ga_index <- createFolds(train.set$Class, k=3)

#Seed for the gafsControl()
set.seed(1056)
ga_seeds <- vector(mode = "list", length = 4)
for(i in 1:3) ga_seeds[[i]] <- sample.int(1500, 1000)

## For the last model:
ga_seeds[[4]] <- sample.int(1000, 1)

#Index for the trainControl()
set.seed(1045481)
tr_index <- createFolds(train.set$Class, k=5)

#Seeds for the trainControl()
set.seed(1056)
tr_seeds <- vector(mode = "list", length = 6)
for(i in 1:5) tr_seeds[[i]] <- sample.int(1000, 36)#

## For the last model:
tr_seeds[[6]] <- sample.int(1000, 1)


gaCtrl <- gafsControl(functions = mylogGA,
                      method = "cv",
                      number = 3,
                      metric = c(internal = "logLoss",
                                 external = "logLoss"),
                      verbose = TRUE,
                      maximize = c(internal = FALSE,
                                   external = FALSE),
                      index = ga_index,
                      seeds = ga_seeds,
                      allowParallel = TRUE)

tCtrl = trainControl(method = "cv", 
                     number = 5,
                     classProbs = TRUE,
                     summaryFunction = mnLogLoss,
                     index = tr_index,
                     seeds = tr_seeds,
                     allowParallel = FALSE)


svmGrid <- expand.grid(sigma= 2^c(-25, -20, -15,-10, -5, 0), C= 2^c(0:5))

t1 <- Sys.time()
set.seed(1234235)
svmFuser.gafs <- gafs(x = train.set[, names(train.set) != "Class"],
                      y = train.set$Class,
                      gafsControl = gaCtrl,
                      trControl = tCtrl,
                      popSize = 10,
                      iters = 100,
                      method = "svmRadial",
                      preProc = c("center", "scale"),
                      tuneGrid = svmGrid,
                      metric="logLoss",
                      maximize = FALSE)

t2<- Sys.time()
svmFuser.gafs.time<-difftime(t2,t1)

save(svmFuser.gafs, file ="svmFuser.gafs.rda")
save(svmFuser.gafs.time, file ="svmFuser.gafs.time.rda")

会话信息:

> sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.3 LTS

locale:
 [1] LC_CTYPE=en_CA.UTF-8       LC_NUMERIC=C               LC_TIME=en_CA.UTF-8       
 [4] LC_COLLATE=en_CA.UTF-8     LC_MONETARY=en_CA.UTF-8    LC_MESSAGES=en_CA.UTF-8   
 [7] LC_PAPER=en_CA.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
 [10] LC_TELEPHONE=C            LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] kernlab_0.9-22  doMC_1.3.3      iterators_1.0.7 foreach_1.4.2   caret_6.0-52    ggplot2_1.0.1   lattice_0.20-33

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.0         magrittr_1.5        splines_3.2.2        MASS_7.3-43         munsell_0.4.2      
 [6] colorspace_1.2-6    foreach_1.4.2       minqa_1.2.4         car_2.0-26          stringr_1.0.0      
 [11] plyr_1.8.3          tools_3.2.2         parallel_3.2.2      pbkrtest_0.4-2      nnet_7.3-10        
 [16] grid_3.2.2          gtable_0.1.2        nlme_3.1-122        mgcv_1.8-7          quantreg_5.18      
 [21] MatrixModels_0.4-1  iterators_1.0.7     gtools_3.5.0        lme4_1.1-9          digest_0.6.8       
 [26] Matrix_1.2-2        nloptr_1.0.4        reshape2_1.4.1      codetools_0.2-11    stringi_0.5-5      
 [31] compiler_3.2.2      BradleyTerry2_1.0-6 scales_0.3.0        stats4_3.2.2        SparseM_1.7        
 [36] brglm_0.5-9         proto_0.3-10       
> 

推荐答案

我能够通过检查 gafs.default 找出我的错误.gafsControl() 中的 seeds 采用长度为 (n_repeas*nresampling)+1 而不是 vectorvector代码>列表(如trainControl$seeds).?gafsControl 的文档中实际上指出 seeds 是一个向量或整数,可用于在每次搜索期间设置种子.种子数必须等于重采样数加 1. 我想出了一个艰难的方式,这是一个提醒仔细阅读文档:D.

I was able to figure out my mistake by inspecting gafs.default. The seeds inside gafsControl() takes a vector with length (n_repeats*nresampling)+1 and not a list (as in trainControl$seeds). It is actually stated in the documentation of ?gafsControl that seeds is a vector or integers that can be used to set the seed during each search. The number of seeds must be equal to the number of resamples plus one. I figured it out the hard way, this is a reminder to carefully read the documentation :D.

    if (!is.null(gafsControl$seeds)) {
        if (length(gafsControl$seeds) < length(gafsControl$index) + 
            1) 
            stop(paste("There must be at least", length(gafsControl$index) + 
            1, "random number seeds passed to gafsControl"))
    }
    else {
        gafsControl$seeds <- sample.int(1e+05, length(gafsControl$index) + 
        1)
    }

所以,设置我的 ga_seeds 的正确方法是:

So, the proper way to set my ga_seeds is:

#Index for gafsControl
set.seed(1045481)
ga_index <- createFolds(train.set$Class, k=3)

#Seed for the gafsControl()
set.seed(1056)
ga_seeds <- sample.int(1500, 4)

这篇关于Caret - 在 gafsControl() 中设置种子的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆