如何处理R(pROC软件包)中的多类ROC分析? [英] How to deal with multiple class ROC analysis in R (pROC package)?

查看:809
本文介绍了如何处理R(pROC软件包)中的多类ROC分析?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

例如,当我在R(pROC软件包)中使用multiclass.roc函数时,我训练了随机林的数据集,这是我的代码:

When I use multiclass.roc function in R (pROC package), for instance, I trained a data set by random forest, here is my code:

# randomForest & pROC packages should be installed:
# install.packages(c('randomForest', 'pROC'))
data(iris)
library(randomForest)
library(pROC)
set.seed(1000)
# 3-class in response variable
rf = randomForest(Species~., data = iris, ntree = 100)
# predict(.., type = 'prob') returns a probability matrix
multiclass.roc(iris$Species, predict(rf, iris, type = 'prob'))

结果为:

Call:
multiclass.roc.default(response = iris$Species, predictor = predict(rf,     
iris, type = "prob"))
Data: predict(rf, iris, type = "prob") with 3 levels of iris$Species: setosa,   
versicolor, virginica.
Multi-class area under the curve: 0.5142

这是对的吗?谢谢!!!

Is this right? Thanks!!!

pROC参考: http://www.inside-r.org/packages/cran/pROC/docs/multiclass.roc

推荐答案

正如您在参考资料中所看到的,multiclass.roc需要一个数值向量(...),以及 roc (由于某种原因不在您提供的链接中)进一步说与 response 相同的长度。您正在传递一个3列的数字矩阵,这显然是错误的,并且从pROC 1.6开始不再受支持。我不知道它以前在做什么,可能不知道您在期待什么。

As you saw in the reference, multiclass.roc expects a "numeric vector (...)", and the documentation of roc that is linked from there (for some reason not in the link you provided) further says "of the same length than response". You are passing a numeric matrix with 3 columns, which is clearly wrong, and isn't supported any more since pROC 1.6. I have no idea what it was doing before, probably not what you were expecting.

这意味着您必须在一个数字模式的单个原子向量中汇总您的预测。就您的模型而言,您可以使用以下方法,尽管将因子转换为数字通常并没有多大意义:

This means you must summarize your predictions in one single atomic vector of numeric mode. In the case of your model, you could use the following, although it generally doesn't really make sense to convert a factor into a numeric:

predictions <- as.numeric(predict(rf, iris, type = 'response'))
multiclass.roc(iris$Species, predictions)

此代码的真正作用是根据您的预测计算3条ROC曲线(一条与setosa vs. versicolor,一条与versicolor vs. virginica,一条与setosa vs.virginica)并平均其AUC。

What this code really does is to compute 3 ROC curves on your predictions (one with setosa vs. versicolor, one with versicolor vs. virginica, and one with setosa vs. virginica) and average their AUC.

另外三个评论:


  • 我说将因子转换为数字没有意义,因为如果没有完美的分类并且重新排列级别,将会得到不同的结果。这就是为什么它不能在pROC中自动完成的原因:您必须在设置中考虑它。

  • 通常,这种多类平均并没有什么意义,您最好从二进制分类的角度重新考虑您的问题。 pROC中尚未实现更高级的多类方法(具有ROC表面等)

  • 如@cbeleites所述,用其模型评估模型是不正确的训练数据(替代),因此在一个实际示例中,您必须保留测试集或使用交叉验证。

这篇关于如何处理R(pROC软件包)中的多类ROC分析?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆