从 R 中的交叉验证(训练)数据绘制 ROC 曲线 [英] Plot ROC curve from Cross-Validation (training) data in R

查看:212
本文介绍了从 R 中的交叉验证(训练)数据绘制 ROC 曲线的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道是否有办法根据 caret 包生成的 SVM-RFE 模型的交叉验证数据绘制平均 ROC 曲线.

I would like to know if there is a way to plot the average ROC Curve from the cross-validation data of a SVM-RFE model generated with the caret package.

我的结果是:

Recursive feature selection

Outer resampling method: Cross-Validated (10 fold, repeated 5 times) 

Resampling performance over subset size:

 Variables    ROC   Sens   Spec Accuracy  Kappa  ROCSD SensSD SpecSD AccuracySD KappaSD Selected
         1 0.6911 0.0000 1.0000   0.5900 0.0000 0.2186 0.0000 0.0000     0.0303  0.0000         
         2 0.7600 0.3700 0.8067   0.6280 0.1807 0.1883 0.3182 0.2139     0.1464  0.3295         
         3 0.7267 0.4233 0.8667   0.6873 0.3012 0.2020 0.3216 0.1905     0.1516  0.3447         
         4 0.6989 0.3867 0.8600   0.6680 0.2551 0.2130 0.3184 0.1793     0.1458  0.3336         
         5 0.7000 0.3367 0.8600   0.6473 0.2006 0.2073 0.3359 0.1793     0.1588  0.3672         
         6 0.7167 0.3833 0.8200   0.6427 0.2105 0.1909 0.3338 0.2539     0.1682  0.3639         
         7 0.7122 0.3767 0.8333   0.6487 0.2169 0.1784 0.3226 0.2048     0.1642  0.3702         
         8 0.7144 0.4233 0.7933   0.6440 0.2218 0.2017 0.3454 0.2599     0.1766  0.3770         
         9 0.8356 0.6533 0.7867   0.7300 0.4363 0.1706 0.3415 0.2498     0.1997  0.4209         
        10 0.8811 0.6867 0.8200   0.7647 0.5065 0.1650 0.3134 0.2152     0.1949  0.4053        *
        11 0.8700 0.6933 0.8133   0.7627 0.5046 0.1697 0.3183 0.2147     0.1971  0.4091         
        12 0.8678 0.6967 0.7733   0.7407 0.4682 0.1579 0.3153 0.2559     

...
The top 5 variables (out of 10):
   SumAverage_GLCM_R1SC4NG2, Variance_GLCM_R1SC4NG2, HGZE_GLSZM_R1SC4NG2, LGZE_GLSZM_R1SC4NG2, SZLGE_GLSZM_R1SC4NG2

我已经尝试过这里提到的解决方案:插入符号中训练数据的 ROC 曲线

I have tried with the solution mentioned here: ROC curve from training data in caret

optSize <- svmRFE_NG2$optsize
selectedIndices <- svmRFE_NG2$pred$Variables == optSize
plot.roc(svmRFE_NG2$pred$obs[selectedIndices],
         svmRFE_NG2$pred$LUNG[selectedIndices])

但是这个解决方案似乎不起作用(产生的 AUC 值完全不同).我已经将训练过程的结果分成了 50 个交叉验证集,如上一个答案所述,但我不知道下一步该怎么做.

But this solution seems not to work (the resulting AUC value is quite different). I have separated the results of the training process into the 50 cross-validation sets, as mentioned in the previous answer, but I do not know what to do next.

resamples<-split(svmRFE_NG2$pred,svmRFE_NG2$pred$Variables)
resamplesFOLD<-split(resamples[[optSize]],resamples[[optSize]]$Resample)

有什么想法吗?

推荐答案

正如你所做的那样,你可以 a) 在 trainControl 参数中启用 savePredictions = T>caret::train,然后,b) 从训练好的模型对象中,使用 pred 变量 - 包含对所有分区和重采样的所有预测 - 计算您想要的任何 ROC 曲线看.您现在可以选择多个 ROC,例如:

As you already did you can a) enable savePredictions = T in the trainControl parameter of caret::train, then, b) from the trained model object, use the pred variable - which contains all predictions over all partitions and resamples - to compute whichever ROC curve you would like to look at. You now have multiple options of which ROC this can be, e.g.:

您可以查看所有分区的所有预测并一次重新采样:

plot(roc(predictor = modelObject$pred$CLASSNAME, response = modelObject$pred$obs))

或者您可以通过单个分区和/或重新采样(这是您在上面尝试过的)来执行此操作.以下示例计算每个分区的 ROC 曲线重新采样,因此 10 个分区和 5 次重复将产生 50 条 ROC 曲线:

Or you could do this over individual partitions and/or resamples (which is what you tried above). The following example computes the ROC curve per partition and resample, so with 10 partitions and 5 repeats will result in 50 ROC curves:

library(plyr)
l_ply(split(modelObject$pred, modelObject$pred$Resample), function(d) {
    plot(roc(predictor = d$CLASSNAME, response = d$obs))
})

根据您的数据和模型,后者在结果 ROC 曲线和 AUC 值中给您一定的差异.您可以在为您的各个分区和重新采样计算的 AUCSDcaret 中看到相同的方差,因此这是您的数据和模型的结果并且是正确的.

Depending on your data and model, the latter will give you certain variance in the resulting ROC curves and AUC values. You can see the same variance in the AUC and SD values caret calculated for your individual partitions and resamples, so this results from your data and model and is correct.

顺便说一句:我使用 pROC::roc 函数来计算上面的例子,但你可以在这里使用任何合适的函数.并且,在使用 caret::train 时,无论模型类型如何,获得的 ROC 始终相同.

BTW: I was using the pROC::roc function for calculating the examples above, but you could use any suitable function here. And, when using caret::train obtaining the ROC is always the same, no matter the model type.

这篇关于从 R 中的交叉验证(训练)数据绘制 ROC 曲线的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆