在 R 中暂停和恢复插入符训练 [英] Pause and resume caret training in R

查看:65
本文介绍了在 R 中暂停和恢复插入符训练的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

让我们假设我将在R中进行一次插入符训练,但是我想将此训练分为两个运行阶段.

Let's assume I will do a caret training in R, but I want to split this training in two run sessions.

library(mlbench)
data(Sonar)
library(caret)
set.seed(998)
inTraining <- createDataPartition(Sonar$Class, p = .75, list = FALSE)
training <- Sonar[ inTraining,]
testing  <- Sonar[-inTraining,]

# First run session
nn.partial <- train(Class ~ ., data = training, 
                method = "nnet",
                 max.turns.of.iteration=5) # Non-existent parameter. But represents my goal

让我们假设,代替 nn 完整对象,我只有一个具有训练信息的局部对象,直到第五回合为止(即 nn.partial ).因此,将来我可以运行以下代码来完成培训工作:

Let´s assume that instead the nn full object I have only a partial object that has training information until the turn 5 (i.e. nn.partial). Thus, in future I could run the below code to finish the training job:

library(mlbench)
data(Sonar)
library(caret)
set.seed(998)
inTraining <- createDataPartition(Sonar$Class, p = .75, list = FALSE)
training <- Sonar[ inTraining,]
testing  <- Sonar[-inTraining,]

nn <- train(Class ~ ., data = training, 
                 method = "nnet",
                 previous.training=nn.partial) # Non-existent parameter. But represents my goal

我知道 max.turns.of.iterationprevious.trainingtrain 函数中都不存在.我只是尽力用代码表示如果已经在 train 函数中实现的理想世界.但是,由于参数不存在,是否有办法通过某种方式欺骗功能来实现此目标(即,对插入符号进行一次以上的训练)?

I am aware that the both max.turns.of.iteration and previous.training do not exist in the train function. I am just trying my best to represent in code what would be the ideal world to accomplish my goal if it was already implemented in train function. However, as the parameters are not there, is there a way to achieve this goal (i.e. do the caret training in more than one run) by tricking the function in some way?

我尝试使用 trainControl 函数没有成功.

I have tried to play with the trainControl function without success.

t.control <- trainControl(repeats=5)
nn <- train(Class ~ ., data = training, 
                 method = "nnet",
trControl = t.control)

这样做,迭代匝数仍然比5高得多.

By doing that, the number of iteration turns is still much higher than 5, as I would like to obtain in my example.

推荐答案

我几乎可以肯定,在当前的插入式基础结构中实现起来非常复杂.但是,我将向您展示如何使用mlr3开箱即用地实现这种功能.

I am almost certain that this is very complicated to implement in carets current infrastructure. However I will show you how to achieve this sort of thing out of the box with mlr3.

示例所需的软件包

library(mlr3)
library(mlr3tuning)
library(paradox)

获取示例任务并定义要调整的学习者:

get an example task and define a learner to be tuned:

task_sonar <- tsk('sonar')
learner <- lrn('classif.rpart', predict_type = 'prob')

定义要调整的超级参数:

define the hyper parameters to be tuned:

ps <- ParamSet$new(list(
  ParamDbl$new("cp", lower = 0.001, upper = 0.1),
  ParamInt$new("minsplit", lower = 1, upper = 10)
))

定义调谐器和重采样策略

define the tuner and resampling strategy

tuner <- tnr("random_search")
cv3 <- rsmp("cv", folds = 3)

定义调整实例

instance <- TuningInstance$new(
  task = task_sonar,
  learner = learner,
  resampling = cv3,
  measures = msr("classif.auc"),
  param_set = ps,
  terminator = term("evals", n_evals = 100) #one can combine multiple terminators such as clock time, number of evaluations, early stopping (stagnation), performance reached - ?Terminator
)

曲调:

tuner$tune(instance)

现在请在一秒钟后按Stop来停止Rstudio中的任务

now press stop after a second to stop the task in Rstudio

instance$archive()

    nr batch_nr  resample_result task_id    learner_id resampling_id iters params tune_x warnings errors classif.auc
 1:  1        1 <ResampleResult>   sonar classif.rpart            cv     3 <list> <list>        0      0   0.7105586
 2:  2        2 <ResampleResult>   sonar classif.rpart            cv     3 <list> <list>        0      0   0.7372720
 3:  3        3 <ResampleResult>   sonar classif.rpart            cv     3 <list> <list>        0      0   0.7335368
 4:  4        4 <ResampleResult>   sonar classif.rpart            cv     3 <list> <list>        0      0   0.7335368
 5:  5        5 <ResampleResult>   sonar classif.rpart            cv     3 <list> <list>        0      0   0.7276246
 6:  6        6 <ResampleResult>   sonar classif.rpart            cv     3 <list> <list>        0      0   0.7111217
 7:  7        7 <ResampleResult>   sonar classif.rpart            cv     3 <list> <list>        0      0   0.6915560
 8:  8        8 <ResampleResult>   sonar classif.rpart            cv     3 <list> <list>        0      0   0.7452875
 9:  9        9 <ResampleResult>   sonar classif.rpart            cv     3 <list> <list>        0      0   0.7372720
10: 10       10 <ResampleResult>   sonar classif.rpart            cv     3 <list> <list>        0      0   0.7172328

在我的情况下,它完成了10次随机搜索迭代.现在,您可以例如致电

in my case it finished 10 iterations of random search. You can now for instance call

save.image()

关闭RStudio并重新打开同一项目

close RStudio and reopen the same project

或在要保留的对象上使用 saveRDS / readRDS

or use saveRDS/readRDS on the objects you wish to keep

saveRDS(instance, "i.rds")
instance <- readRDS("i.rds")

在加载所需的软件包后,使用

after loading the required packages resume training with

tuner$tune(instance)

几秒钟后再次停止:

在我的情况下,它又完成了12次迭代:

in my case it finished an additional 12 iterations:

instance$archive()

    nr batch_nr  resample_result task_id    learner_id resampling_id iters params tune_x warnings errors classif.auc
 1:  1        1 <ResampleResult>   sonar classif.rpart            cv     3 <list> <list>        0      0   0.7105586
 2:  2        2 <ResampleResult>   sonar classif.rpart            cv     3 <list> <list>        0      0   0.7372720
 3:  3        3 <ResampleResult>   sonar classif.rpart            cv     3 <list> <list>        0      0   0.7335368
 4:  4        4 <ResampleResult>   sonar classif.rpart            cv     3 <list> <list>        0      0   0.7335368
 5:  5        5 <ResampleResult>   sonar classif.rpart            cv     3 <list> <list>        0      0   0.7276246
 6:  6        6 <ResampleResult>   sonar classif.rpart            cv     3 <list> <list>        0      0   0.7111217
 7:  7        7 <ResampleResult>   sonar classif.rpart            cv     3 <list> <list>        0      0   0.6915560
 8:  8        8 <ResampleResult>   sonar classif.rpart            cv     3 <list> <list>        0      0   0.7452875
 9:  9        9 <ResampleResult>   sonar classif.rpart            cv     3 <list> <list>        0      0   0.7372720
10: 10       10 <ResampleResult>   sonar classif.rpart            cv     3 <list> <list>        0      0   0.7172328
11: 11       11 <ResampleResult>   sonar classif.rpart            cv     3 <list> <list>        0      0   0.7325289
12: 12       12 <ResampleResult>   sonar classif.rpart            cv     3 <list> <list>        0      0   0.7105586
13: 13       13 <ResampleResult>   sonar classif.rpart            cv     3 <list> <list>        0      0   0.7215133
14: 14       14 <ResampleResult>   sonar classif.rpart            cv     3 <list> <list>        0      0   0.6915560
15: 15       15 <ResampleResult>   sonar classif.rpart            cv     3 <list> <list>        0      0   0.6915560
16: 16       16 <ResampleResult>   sonar classif.rpart            cv     3 <list> <list>        0      0   0.7335368
17: 17       17 <ResampleResult>   sonar classif.rpart            cv     3 <list> <list>        0      0   0.7276246
18: 18       18 <ResampleResult>   sonar classif.rpart            cv     3 <list> <list>        0      0   0.7111217
19: 19       19 <ResampleResult>   sonar classif.rpart            cv     3 <list> <list>        0      0   0.7172328
20: 20       20 <ResampleResult>   sonar classif.rpart            cv     3 <list> <list>        0      0   0.7276246
21: 21       21 <ResampleResult>   sonar classif.rpart            cv     3 <list> <list>        0      0   0.7105586
22: 22       22 <ResampleResult>   sonar classif.rpart            cv     3 <list> <list>        0      0   0.7276246

不按停止键再次运行

tuner$tune(instance)

它将完成 100 次评估

and it will finish the 100 evals

限制:上面的示例将调整(超参数的评估)划分为多个会话.然而,它并没有将一个训练实例拆分为多个会话——在 R 中很少有包支持这种事情——keras/tensorflow 是我所知道的唯一一个.

Limitation: The above example splits the tuning (evaluation of hyper-parameters) to multiple sessions). However it does not split one training instance into multiple sessions - very few packages support this kind of thing in R - keras/tensorflow are the only one I know of.

但是,不管一种算法的一个训练实例的长度如何,这种算法的调整(超参数的评估)都需要花费更多的时间,因此能够像上面一样暂停/恢复调整更为有利.例子.

However regardless of the length of one training instance for an algorithm, the tuning (evaluation of hyper parameters) of such an algorithm takes much more time so it is more advantageous to be able to pause/resume the tuning as in the above example.

如果您觉得这很有趣,这里有一些学习mlr3的资源

If you find this interesting here are some resources to learn mlr3

https://mlr3book.mlr-org.com/
https://mlr3gallery.mlr-org.com/

还要看看mlr3pipelines- https://mlr3pipelines.mlr-org.com/articles/introduction.html

Take a look also at mlr3pipelines - https://mlr3pipelines.mlr-org.com/articles/introduction.html

这篇关于在 R 中暂停和恢复插入符训练的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆