来自插入符的交叉验证预测分配给不同的折叠 [英] Cross-validation predictions from caret in assigned to different folds

查看:43
本文介绍了来自插入符的交叉验证预测分配给不同的折叠的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道为什么"Fold1"中的预测实际上是我预定义的折叠中第二个折叠的预测.我附上我的意思的例子.

I am wondering why predictions from 'Fold1' are actually predictions from the second fold in my predefined folds. I attach an example of what I mean.

# load the library
library(caret)
# load the cars dataset
data(cars)
# define folds
cv_folds <- createFolds(cars$Price, k = 5, list = TRUE, returnTrain = TRUE)
# define training control
train_control <- trainControl(method="cv", index = cv_folds, savePredictions = 'final')
# fix the parameters of the algorithm
# train the model
model <- caret::train(Price~., data=cars, trControl=train_control, method="gbm", verbose = F)

model$pred$rowIndex[model$pred$Resample == 'Fold1'] %in% cv_folds[[2]]

推荐答案

'Fold1'的重采样数据是不在 cv_folds [[1]] .这些记录包含在 cv_folds 2-5中.这是正确的,因为您正在运行5倍交叉验证.测试了重采样折叠1相对于在折叠2-5上训练模型的能力.对重采样的第2折进行测试,以防对第1、3-5折进行训练,依此类推.

The Resample data of 'Fold1' are the records which are not in cv_folds[[1]]. These records are contained in cv_folds 2-5. This is correct as you are running a 5-fold cross-validation. Resample Fold 1 is tested against training the model on folds 2-5. Resample fold 2 is tested against training on folds 1, 3-5, and so on.

总结: Fold1 中的预测是在cv_folds 2-5上训练模型的测试预测.

In summary: The predictions in Fold1 are the test predictions from training a model on cv_folds 2-5.

基于评论

所有必需的信息都在model $ pred表中.我添加了一些代码进行说明:

All the needed info is in the model$pred table. I added a bit of code for clarification:

model$pred %>% 
  select(rowIndex, pred, Resample) %>%
  rename(predection = pred, holdout = Resample) %>% 
  mutate(trained_on = case_when(holdout == "Fold1" ~ "Folds 2, 3, 4, 5",
                                holdout == "Fold2" ~ "Folds 1, 3, 4, 5", 
                                holdout == "Fold3" ~ "Folds 1, 2, 4, 5", 
                                holdout == "Fold4" ~ "Folds 1, 2, 3, 5", 
                                holdout == "Fold5" ~ "Folds 1, 2, 3, 4"))

  rowIndex predection holdout       trained_on
1      610   13922.60   Fold2 Folds 1, 3, 4, 5
2      623   38418.83   Fold2 Folds 1, 3, 4, 5
3      604   12383.55   Fold2 Folds 1, 3, 4, 5
4      607   15040.07   Fold2 Folds 1, 3, 4, 5
5       95   33549.40   Fold2 Folds 1, 3, 4, 5
6      624   40357.35   Fold2 Folds 1, 3, 4, 5

基本上,需要与预测进一步叠加的是model $ pred表中的 pred rowIndex 列.

Basicly what you need for further stacking with the predictions are the pred and rowIndex columns from the model$pred table.

rowIndex引用原始数据中的行.因此,rowIndex 610引用了汽车数据集中的记录610.您可以将数据与obs进行比较,obs是来自汽车数据集的Price列的值.

The rowIndex refers to the row from the original data. So rowIndex 610 refers to record 610 in the cars dataset. You can compare that the data in obs, which is the value of the Price column from the cars dataset.

这篇关于来自插入符的交叉验证预测分配给不同的折叠的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆