插入符号获得训练 &从模型到绘图的 cv 预测 [英] Caret obtain train & cv predictions from model to plot

查看:48
本文介绍了插入符号获得训练 &从模型到绘图的 cv 预测的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我训练了一个简单的模型:

I've trained a simple model:

mySim <- train(Event ~ .,
               method = 'rf',
               data = train,
               tuneGrid = tg)

优化隐藏层的两个nnet参数weight_decaysize.我刚开始尝试 caret,所以我通常会为每个模型构建绘制 train errorcv error .为此,我需要有我的 trainvalidation 传递的预测值.

Optimising the two nnet parameters weight_decay and size of the hidden layer. I'm new to trying out caret so what I would usually do is plot the train error and cv error for each model build. To do this, I'd need to have the predictive values of my train and validation pass.

这是我第一次使用交叉验证,所以我有点不确定如何在每次 tuneGrid 迭代时从训练和保持集获得预测.

This is the first time I've used cross validation so I'm a little unsure how I can go about getting the predictions from the train and hold-out set at each tuneGrid iteration.

如果我有一个长度为 3 的网格搜索(要构建 3 个模型)和 5 折交叉验证,我假设我将有 15 组训练和;每个模型的保持预测.

If I have a grid search of length 3 (3 models to build) and 5-fold cross validation I assume I'm going to have 15 sets of train & holdout predictions for each model.

我基本上要构建的情节是:

The plot I'm essentially looking to build is:

在我的 y 轴是性能指标的情况下,假设使用 nnet 进行分类的熵损失和 x 轴上的 size 网格搜索值增加从 0 - 最大.

Where my y-axis is a performance metric, lets say entropy loss for the sake of classification with nnet and the size grid search values on the x-axis increases from 0 - max.

有什么方法可以在 trainControl 交叉验证期间从训练/保持集提取预测值?

Is there a way in which I can extract the predicted values from the train / holdout set during trainControl cross validation?

我查看了 train 返回的一些属性,但不确定我是否遗漏了什么.

I've looked through some of the attributes train returns but not sure if I'm missing something.

我知道我在这个问题中缺少代码,但希望我已经解释了自己.

I know I lack code in this question but hopefully I've explained myself.

更新我假设在 trainControl 中设置以下参数将返回允许我创建此图的预测是正确的:

Update I am correct in assuming setting the following parameters in trainControl will return the predictions allowing me to create this plot:

  • returnResamp
  • savePredictions

推荐答案

carets::train 仅保留保留预测.如果您指定 savePredictions ="all" 它将保存所有超参数组合的预测.但是,它不会保存训练集预测.您可以在知道哪些索引用于保留的情况下生成它们.此信息是 train 返回的对象的 model$pred 槽.mlr 包可以选择保留和训练预测和指标.

carets::train keeps only the hold out predictions. If you specify savePredictions ="all" it will save hold out predictions for all hyper parameter combinations. However it does not save the train set predictions. You could generate them afterwards with the knowledge which indexes were used for the hold outs. This info is the model$pred slot of the object returned by train. mlr package has an option to keep both hold out and train predictions and metric.

以下是如何使用 mlr 库执行请求操作的示例:

Here is an example on how to perform the requested operation with mlr library:

library(mlr)
library(mlbench) #for the data set

我将使用声纳数据集:

data(Sonar)

创建任务:

task <- makeClassifTask(data = Sonar, target = "Class")

创建学习者:

lrn <- makeLearner("classif.nnet", predict.type = "prob")

获取学习者的所有可调参数:

get all tune-able parameters for a learner:

getParamSet("classif.nnet")

设置您要调整的范围和范围:

set which ones you would like to tune and the range:

ps <- makeParamSet(
  makeIntegerParam("size", lower = 3, upper = 5),
  makeNumericParam("decay", lower = 0.1, upper = 0.2))

定义重采样:

cross_val <- makeResampleDesc("RepCV",  
                              reps = 2, folds = 5, stratify  = TRUE, predict = "both")

搜索将如何执行(本例中为网格):

how the search will be performed (grid in this case):

ctrl <- mlr::makeTuneControlGrid(resolution = 4L)

把所有东西放在一起:

res.mbo <- tuneParams(lrn, task, cross_val, par.set = ps, control = ctrl, 
                      show.info = FALSE, measures = list(auc, setAggregation(auc, test.sd),  setAggregation(auc, train.mean), setAggregation(auc, train.sd)))

您可以在一个列表中定义多个度量(第一个用于选择超参数,其他仅用于展示).

you can define many measures in a list (the first one is used to select hyper parameters all the other are just for show).

提取结果:

res <- mlr::generateHyperParsEffectData(res.mbo)$data

情节:

library(tidyverse)

res %>%
  gather(key, value, c(3,5)) %>%
  mutate(key = as.factor(key)) %>%
ggplot()+
  geom_point(aes(x = size, y = value, color = key))+
  geom_smooth(aes(x = size, y = value, color = key))+
  facet_wrap(~decay)

一堆关于 geom_smooth 的警告,因为每次拟合只有 3 个点

a bunch of warnings about geom_smooth since there are only 3 points per fit

以及如何在 caret 中仅在保留样本中执行此操作的示例:

and an example on how to do it in caret just on the hold out samples:

library(caret)

创建曲调控制

ctrl <- trainControl(
  method = "repeatedcv",
  number = 5,
  repeats = 2, 
  classProbs = TRUE,
  savePredictions = "all",
  returnResamp = "all",
  summaryFunction = twoClassSummary
)

创建超参数网格:

grid <- expand.grid(size = c(4, 5, 6), decay = seq(from = 0.1, to =  0.2, length.out = 4))

调:

fit <- caret::train(Sonar[,1:60], Sonar$Class, 
                 method = 'nnet',
                 tuneGrid = grid, 
                 metric = 'ROC', 
                 trControl = ctrl) 

情节:

fit$results %>%
  ggplot()+
  geom_point(aes(x = size, y = ROC))+
  geom_smooth(aes(x = size, y = ROC))+
  facet_wrap(~decay)

这篇关于插入符号获得训练 &amp;从模型到绘图的 cv 预测的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆