没有交叉验证的StepLDA [英] StepLDA without Cross Validation
问题描述
我想根据训练错误选择变量. 因此,我将trainControl中的方法设置为"none".但是,如果我在下面两次运行该函数,则会得到两个不同的错误(正确率). 在本示例中,差异不值得一提.即使这样,我也完全不会期望有任何区别.
I would like to select the variables on the basis of the training error. For that reason I set method in trainControl to "none". However, if I run the function below twice I get two different errors (correctness rates). In this exsample the difference is not worth to mention. Even so I wouldn't have expected any difference at all.
有人知道这种差异来自何处吗?
Does somebody know where this difference comes from?
library(caret)
c_1 <- trainControl(method = "none")
maxvar <-(4)
direction <-"forward"
tune_1 <-data.frame(maxvar,direction)
train(Species~., data=iris, method = "stepLDA", trControl=c_1, tuneGrid=tune_1)->tr
第一
`stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
150 observations of 4 variables in 3 classes; direction: forward
stop criterion: assemble 4 best variables.
correctness rate: 0.96; in: "Petal.Width"; variables (1): Petal.Width
correctness rate: 0.96667; in: "Sepal.Width"; variables (2): Petal.Width, Sepal.Width
correctness rate: 0.97333; in: "Petal.Length"; variables (3): Petal.Width, Sepal.Width, Petal.Length
correctness rate: 0.98; in: "Sepal.Length"; variables (4): Petal.Width, Sepal.Width, Petal.Length, Sepal.Length
hr.elapsed min.elapsed sec.elapsed
0.00 0.00 0.28
第二
> train(Species~., data=iris, method = "stepLDA", trControl=c_1, tuneGrid=tune_1)->tr
`stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
150 observations of 4 variables in 3 classes; direction: forward
stop criterion: assemble 4 best variables.
correctness rate: 0.96; in: "Petal.Width"; variables (1): Petal.Width
correctness rate: 0.96; in: "Sepal.Width"; variables (2): Petal.Width, Sepal.Width
correctness rate: 0.96667; in: "Petal.Length"; variables (3): Petal.Width, Sepal.Width, Petal.Length
correctness rate: 0.98; in: "Sepal.Length"; variables (4): Petal.Width, Sepal.Width, Petal.Length, Sepal.Length
hr.elapsed min.elapsed sec.elapsed
0.0 0.0 0.3
推荐答案
您仍在进行10倍交叉验证.只要不设置种子,多次训练模型时,总会得到略有不同的答案.
Your are still doing 10-fold cross validation. As long as you do not set the seed you will always get a slightly different answer when you train the model multiple times.
如果运行这段代码(包括set.seed),您将获得相同的正确率.
if you run this piece of code, including the set.seed you will get the same correctness rates.
set.seed(42)
tr <- train(Species~., data=iris, method = "stepLDA", trControl=c_1, tuneGrid=tune_1)
根据评论进行
10倍交叉验证的正确率不是来自Caret,而是来自klaR软件包中的stepclass函数.
Edit based on comment:
The 10-fold cross-validated correctness rate is not coming from Caret, but from the stepclass function from the klaR package.
stepclass(x,分组,方法,改进= 0.05,maxvar = Inf, start.vars = NULL,方向= c("both","forward","backward"), 条件="CR",倍数= 10 ,cv.groups = NULL,输出= TRUE, min1var = TRUE,...)
stepclass(x, grouping, method, improvement = 0.05, maxvar = Inf, start.vars = NULL, direction = c("both", "forward", "backward"), criterion = "CR", fold = 10, cv.groups = NULL, output = TRUE, min1var = TRUE, ...)
fold参数用于交叉验证;如果"cv.groups"为 指定.
fold parameter for cross-validation; omitted if ‘cv.groups’ is specified.
如果需要,可以通过将fold参数添加到火车函数中来进行调整:
you can adjust this if you want to by just adding the fold parameter to the train function:
tr <- train(Species~., data=iris, method = "stepLDA", trControl=c_1, tuneGrid=tune_1, fold = 1)
但是1的倍数是没有意义的.您会收到很多警告和错误.
But a fold of 1 is meaningless. you will get a bunch of warnings and errors.
这篇关于没有交叉验证的StepLDA的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!