对于 > 2 个类别的结果,插入符训练方法不起作用(所有准确度结果都有问题) [英] caret train method not working (something is wrong for all accuracy results) for outcomes with >2 categories

查看:37
本文介绍了对于 > 2 个类别的结果,插入符训练方法不起作用(所有准确度结果都有问题)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道之前有人问过类似的问题,但还没有明确的答案(或者我尝试了他们的解决方案但没有成功:使用 GBM 的插入符错误,但不是没有插入符Caret train 方法抱怨错误的;缺少所有 RMSE 指标值)

我尝试使用插入符号训练方法来预测分类结果(下面的在线数据示例)

库(mlbench)数据(声纳)str(声纳[, 1:10])图书馆(插入符号)set.seed(998)Sonar$rand<-rnorm(nrow(Sonar)) ##随机创建新的3类结果表(声纳$兰特)Sonar$Class_new<-ifelse(Sonar$Class=="R","R",ifelse(Sonar$rand>0,"M","H"))表(声纳$Class_new)fitControl <- trainControl(## 10-fold CV方法 = "repeatedcv",数字 = 10,##重复十次重复 = 10)inTraining <- createDataPartition(Sonar$Class_new, p = .75, list = FALSE)培训 <- 声纳[ inTraining,]测试 <- 声纳 [-inTraining,]gbmFit1 <- train(Class_new ~ ., data = training,方法 = "gbm",trControl = 适合控制,详细 = FALSE)

每当我使用具有 3 个类别而不是原始 Class 变量中的 2 个类别的新类变量 (Class_new) 时,我都会收到以下警告.它运行良好,有 2 个类别的结果变量.无论使用哪种火车方法,情况都是一样的(我尝试了 rfgbm、svm,都一样)

出了点问题;缺少所有准确度指标值:

 准确度 Kappa最小.: NA 分钟.: 不适用第 1 区:不适用 第 1 区:不适用中位数:NA 中位数:NA均值:NaN 均值:NaN第三区:不适用 第三区:不适用最大限度.: 不适用: 不适用不适用:9 不适用:9

<块引用>

train.default(x, y, weights = w, ...) 中的错误:停止
另外:警告消息:
1: 在 train.default(x, y, weights = w, ...) :
指标RMSE"不在结果集中.将改为使用准确性.
2:在nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
重采样的性能指标中存在缺失值.

非常感谢您对此的任何帮助!

解决方案

不是在 train 函数中传递公式,而是传递参数 x、y、方法等的值

老方法:

modFit = train(data.df$Label ~ .,数据 = 数据.df,方法 = "rpart",trControl=cntr,调长 = 7)

新方法:

modFit = train(x = data.df.cols,y = data.df$标签,方法 = "rpart",trControl = cntrl,调长 = 7)

注意:x = data.df.cols 包含除标签之外的所有列,data.df.cols = data.df[,2:ncol(data.df)]

Hi I know someone asked similar issues before but no clear answer yet (or I tried their solution without success: Caret error using GBM, but not without caret Caret train method complains Something is wrong; all the RMSE metric values are missing )

I tried to use caret training methods to predict the categorical outcomes (online data examples below)

library(mlbench)
data(Sonar)
str(Sonar[, 1:10])

library(caret)
set.seed(998)

Sonar$rand<-rnorm(nrow(Sonar))  ##to randomly create the new 3-category outcome
table(Sonar$rand)
Sonar$Class_new<-ifelse(Sonar$Class=="R","R",ifelse(Sonar$rand>0,"M","H"))
table(Sonar$Class_new)

fitControl <- trainControl(## 10-fold CV
                           method = "repeatedcv",
                           number = 10,
                           ## repeated ten times
                           repeats = 10)

inTraining <- createDataPartition(Sonar$Class_new, p = .75, list = FALSE)
training <- Sonar[ inTraining,]
testing  <- Sonar[-inTraining,]

gbmFit1 <- train(Class_new ~ ., data = training,
                 method = "gbm",
                 trControl = fitControl,
                 verbose = FALSE)

Whenever I used the new class variable (Class_new) which has 3 categories, rather than 2 categories in original Class variable, I got the warnings below. It runs fine with 2 category outcome variables. And it is the same case regardless of the train methods (I tried rf, gbm, svm, all the same)

Something is wrong; all the Accuracy metric values are missing:

    Accuracy       Kappa    
 Min.   : NA   Min.   : NA  
 1st Qu.: NA   1st Qu.: NA  
 Median : NA   Median : NA  
 Mean   :NaN   Mean   :NaN  
 3rd Qu.: NA   3rd Qu.: NA  
 Max.   : NA   Max.   : NA  
 NA's   :9     NA's   :9    

Error in train.default(x, y, weights = w, ...) : Stopping
In addition: Warning messages:
1: In train.default(x, y, weights = w, ...) :
The metric "RMSE" was not in the result set. Accuracy will be used instead.
2: In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
There were missing values in resampled performance measures.

Any help on this is greatly appreciated!

解决方案

Instead of passing the formula in the train function, pass values for parameters x, y, method etc

the old way:

modFit = train(data.df$Label ~ ., 
                 data = data.df, 
                method = "rpart", 
                trControl= cntr, 
                tuneLength = 7)

new way:

modFit = train(x = data.df.cols, 
                 y = data.df$Label,
                 method = "rpart",
                   trControl = cntrl, 
                   tuneLength = 7)

Note: x = data.df.cols has all columns except the label, data.df.cols = data.df[,2:ncol(data.df)]

这篇关于对于 > 2 个类别的结果,插入符训练方法不起作用(所有准确度结果都有问题)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆