如何使用 r 中的 caret 包在最佳调整超参数的 10 倍交叉验证中获得每个折叠的预测? [英] How to get predictions for each fold in 10-fold cross-validation of the best tuned hyperparameters using caret package in r?

查看:117
本文介绍了如何使用 r 中的 caret 包在最佳调整超参数的 10 倍交叉验证中获得每个折叠的预测?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图使用 R 中的 caret 包使用 3 次重复的 10 折交叉验证运行 SVM 模型.我想使用最佳调整的超参数获得每个折叠的预测结果.我正在使用以下代码

I was trying to run SVM model using 10-fold cross-validation with 3 repeats using the caret package in R. I want to get the prediction results of each fold using the best tuned hyperparameters. I am using the following code

# Load packages
library(mlbench)
library(caret)

# Load data
data(BostonHousing)

#Dividing the data into train and test set
set.seed(101)
sample <- createDataPartition(BostonHousing$medv, p=0.80, list = FALSE)
train <- BostonHousing[sample,]
test <- BostonHousing[-sample,]

control <- trainControl(method='repeatedcv', number=10, repeats=3, savePredictions=TRUE)
metric <- 'RMSE'

# Support Vector Machines (SVM) 
set.seed(101)
fit.svm <- train(medv~., data=train, method='svmRadial', metric=metric,
                 preProc=c('center', 'scale'), trControl=control)
fit.svm$bestTune
fit.svm$pred 

fit.svm$pred 使用所有超参数组合给出预测.但我只想对重复的每 10 倍平均值使用最佳调整的超参数进行预测.

fit.svm$pred giving me predictions using all combinations of the hyperparameters. But I want to have only the predictions using best-tuned hyperparameters for each 10-fold average of the repeats.

推荐答案

实现目标的一种方法是使用 fit.svm$ 中的超参数对 fit.svm$pred 进行子集化bestTune,然后通过 CV 复制聚合所需的度量.我将使用 dplyr 执行此操作:

One way to achieve your goal is to subset fit.svm$pred using the hyper parameters in fit.svm$bestTune, and then aggregate the desired measure by CV replicates. I will perform this using dplyr:

library(tidyverse)
library(caret)
fit.svm$pred %>%
  filter(sigma == fit.svm$bestTune$sigma & C == fit.svm$bestTune$C) %>% #subset 
  mutate(fold = gsub("\\..*", "", Resample), #extract fold info from resample info
         rep = gsub(".*\\.(.*)", "\\1", Resample)) %>% #extract replicate info from resample info
  group_by(rep) %>% #group by replicate
  summarise(rmse = RMSE(pred, obs)) #aggregate the desired measure

输出:

# A tibble: 3 x 2
  rep    rmse
  <chr> <dbl>
1 Rep1   4.02
2 Rep2   3.96
3 Rep3   4.06

如果您不喜欢使用正则表达式,或者只是想节省一些输入,您可以使用 dplyr::separate:

if you dislike using regex, or just want to save a bit of typing you can use dplyr::separate:

fit.svm$pred %>%
  filter(sigma == fit.svm$bestTune$sigma & C == fit.svm$bestTune$C) %>%
  separate(Resample, c("fold", "rep"), "\\.") %>%
  group_by(rep) %>%
  summarise(rmse = RMSE(obs, pred))

回应评论.将观测值和预测值写入 csv.文件:

in response to comment. To write observed and predicted values to a csv. file:

fit.svm$pred %>%
  filter(sigma == fit.svm$bestTune$sigma & C == fit.svm$bestTune$C) %>%
  write.csv("predictions.csv")

这篇关于如何使用 r 中的 caret 包在最佳调整超参数的 10 倍交叉验证中获得每个折叠的预测?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆