插入符号中的交叉验证中的预处理 [英] preprocess within cross-validation in caret

查看：33 发布时间：2021/7/3 18:35:16 r r-caret

本文介绍了插入符号中的交叉验证中的预处理的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个关于数据预处理的问题需要澄清.据我了解，当我们通过交叉验证调整超参数和估计模型性能时，而不是预处理整个数据集，我们需要在交叉验证中进行.换句话说，在交叉验证中，我们预处理训练折叠，然后使用相同的预处理参数处理测试折叠并进行预测.

I have a question about data preprocess that need to be clarified. To my understanding, when we tune hyperparameters and estimate model performance via cross-validation, rather than preprocess the whole dataset, we need to do that within cross-validation. In other words, in cross-validation, we preprocess training folds, then use the same preprocess parameter to process test fold and make predictions.

在下面的示例代码中，当我在 caret::train 中指定 preProcess 时，它会自动执行吗?如果有人能澄清我，真的很感激.

In the example code below, when I specify the preProcess within caret::train, does it automatically do that? Really appreciate it if someone can clarify me on that.

从一些网上资料来看，有些人对整个数据集(trainset)进行预处理，然后使用预处理数据通过交叉验证来调整超参数，这似乎不太正确......

From some online sources, some people preprocess the whole dataset (trainset) and then use the preprocess data to tune hyperparameters via cross-validation, it does not seems to be right....

library(caret)
library(mlbench)
data(PimaIndiansDiabetes)

control <- trainControl(method="cv", 
                        number=5,
                        preProcOptions = list(pcaComp=4))
grid=expand.grid(mtry=c(1,2,3))

model <- train(diabetes~., data=PimaIndiansDiabetes, method="rf", 
               preProcess=c("scale", "center", "pca"), 
               trControl=control,
               tuneGrid=grid)

插入符号中的交叉验证中的预处理 [英] preprocess within cross-validation in caret

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

插入符号中的交叉验证中的预处理 [英] preprocess within cross-validation in caret

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭