使用插入符号包和 R 绘制学习曲线 [英] Plot learning curves with caret package and R

查看：21 发布时间：2021/12/14 9:54:31 r plot machine-learning supervised-learning

本文介绍了使用插入符号包和 R 绘制学习曲线的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想研究模型调整的偏差/方差之间的最佳权衡.我正在为 R 使用插入符号，它允许我针对模型的超参数(mtry、lambda 等)绘制性能指标(AUC、准确度...)并自动选择最大值.这通常会返回一个好的模型，但如果我想进一步挖掘并选择不同的偏差/方差权衡，我需要一个学习曲线，而不是性能曲线.

I would like to study the optimal tradeoff between bias/variance for model tuning. I'm using caret for R which allows me to plot the performance metric (AUC, accuracy...) against the hyperparameters of the model (mtry, lambda, etc.) and automatically chooses the max. This typically returns a good model, but if I want to dig further and choose a different bias/variance tradeoff I need a learning curve, not a performance curve.

为了简单起见，假设我的模型是一个随机森林，它只有一个超参数mtry"

For the sake of simplicity, let's say my model is a random forest, which has just one hyperparameter 'mtry'

我想绘制训练集和测试集的学习曲线.像这样:

I would like to plot the learning curves of both training and test sets. Something like this:

(红色曲线为测试集)

在 y 轴上我放置了一个错误度量(错误分类示例的数量或类似的东西)；在 x 轴上的mtry"或训练集大小.

On the y axis I put an error metric (number of misclassified examples or something like that); on the x axis 'mtry' or alternatively the training set size.

问题:

caret 是否具有根据不同大小的训练集折叠迭代训练模型的功能?如果我必须手动编码，我该怎么做?

Has caret the functionality to iteratively train models based of training set folds different in size? If I have to code by hand, how can I do that?

如果我想将超参数放在 x 轴上，我需要通过 caret::train 训练的所有模型，而不仅仅是最终模型(在 CV 之后获得的性能最高的模型).这些废弃"的模型在训练后还能用吗?

If I want to put the hyperparameter on the x axis, I need all the models trained by caret::train, not just the final model (the one with maximum performance got after CV). Are these "discarded" model still available after train?

推荐答案

这是我的代码，介绍了我如何在使用 Caret 时处理在 R 中绘制学习曲线的问题> 用于训练模型的包.我在 R 中使用 Motor Trend Car Road Tests 进行说明.首先，我将 mtcars 数据集随机化并拆分为训练集和测试集.21 条训练记录和 13 条测试集记录.在这个例子中，响应特性是 mpg.

Here's my code on how I approached this issue of plotting a learning curve in R while using the Caret package to train your model. I use the Motor Trend Car Road Tests in R for illustrative purposes. To begin, I randomize and split the mtcars dataset into training and test sets. 21 records for training and 13 records for the test set. The response feature is mpg in this example.

# set seed for reproducibility
set.seed(7)

# randomize mtcars
mtcars <- mtcars[sample(nrow(mtcars)),]

# split iris data into training and test sets
mtcarsIndex <- createDataPartition(mtcars$mpg, p = .625, list = F)
mtcarsTrain <- mtcars[mtcarsIndex,]
mtcarsTest <- mtcars[-mtcarsIndex,]

# create empty data frame 
learnCurve <- data.frame(m = integer(21),
                     trainRMSE = integer(21),
                     cvRMSE = integer(21))

# test data response feature
testY <- mtcarsTest$mpg

# Run algorithms using 10-fold cross validation with 3 repeats
trainControl <- trainControl(method="repeatedcv", number=10, repeats=3)
metric <- "RMSE"

# loop over training examples
for (i in 3:21) {
    learnCurve$m[i] <- i
    
    # train learning algorithm with size i
    fit.lm <- train(mpg~., data=mtcarsTrain[1:i,], method="lm", metric=metric,
             preProc=c("center", "scale"), trControl=trainControl)        
    learnCurve$trainRMSE[i] <- fit.lm$results$RMSE
    
    # use trained parameters to predict on test data
    prediction <- predict(fit.lm, newdata = mtcarsTest[,-1])
    rmse <- postResample(prediction, testY)
    learnCurve$cvRMSE[i] <- rmse[1]
}

pdf("LinearRegressionLearningCurve.pdf", width = 7, height = 7, pointsize=12)

# plot learning curves of training set size vs. error measure
# for training set and test set
plot(log(learnCurve$trainRMSE),type = "o",col = "red", xlab = "Training set size",
          ylab = "Error (RMSE)", main = "Linear Model Learning Curve")
lines(log(learnCurve$cvRMSE), type = "o", col = "blue")
legend('topright', c("Train error", "Test error"), lty = c(1,1), lwd = c(2.5, 2.5),
       col = c("red", "blue"))

dev.off()

输出图如下:

The output plot is as shown below:

这篇关于使用插入符号包和 R 绘制学习曲线的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用插入符号包和 R 绘制学习曲线 [英] Plot learning curves with caret package and R

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录关闭

使用插入符号包和 R 绘制学习曲线 [英] Plot learning curves with caret package and R

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录 关闭

登录关闭