使用插入符号包应用k倍交叉验证模型 [英] Applying k-fold Cross Validation model using caret package

查看:117
本文介绍了使用插入符号包应用k倍交叉验证模型的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

首先让我说,我已经阅读了很多有关交叉验证的文章,而且似乎有很多困惑。我的理解很简单:

Let me start by saying that I have read many posts on Cross Validation and it seems there is much confusion out there. My understanding of that it is simply this:


  1. 执行k折交叉验证,即10折以了解10折的平均误差。

  2. 如果可以接受,则在完整的数据集上训练模型。

我正在尝试在R中使用 rpart 并利用插入符程序包构建决策树。以下是我正在使用的代码。

I am attempting to build a decision tree using rpart in R and taking advantage of the caret package. Below is the code I am using.

# load libraries
library(caret)
library(rpart)

# define training control
train_control<- trainControl(method="cv", number=10)

# train the model 
model<- train(resp~., data=mydat, trControl=train_control, method="rpart")

# make predictions
predictions<- predict(model,mydat)

# append predictions
mydat<- cbind(mydat,predictions)

# summarize results
confusionMatrix<- confusionMatrix(mydat$predictions,mydat$resp)

我有一个关于插入符号训练应用程序的问题。我已阅读插入符号包简介火车部分在重采样过程中哪个状态被确定为最佳参数集。

I have one question regarding the caret train application. I have read A Short Introduction to the caret Package train section which states during the resampling process the "optimal parameter set" is determined.

在我的示例中,我是否正确编码了?我是否需要在代码中定义 rpart 参数,还是我的代码足够?

In my example have I coded it up correctly? Do I need to define the rpart parameters within my code or is my code sufficient?

推荐答案

在执行k倍交叉验证时,您已经对每个样本进行了预测,仅对10种以上的模型进行了预测(假设k = 10)。
无需对完整数据进行预测,因为您已经从k个不同模型中获得了它们的预测。

when you perform k-fold cross validation you are already making a prediction for each sample, just over 10 different models (presuming k = 10). There is no need make a prediction on the complete data, as you already have their predictions from the k different models.

您可以做的是以下操作:

What you can do is the following:

train_control<- trainControl(method="cv", number=10, savePredictions = TRUE)

然后

model<- train(resp~., data=mydat, trControl=train_control, method="rpart")

如果您想以一种不错的格式查看观察值和预测值,只需输入:

if you want to see the observed and predictions in a nice format you simply type:

model$pred

同样,对于问题的第二部分,插入符号也应处理所有参数。如果需要,可以手动尝试调整参数。

Also for the second part of your question, caret should handle all the parameter stuff. You can manually try tune parameters if you desire.

这篇关于使用插入符号包应用k倍交叉验证模型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆