从 R 中的交叉验证(训练)数据绘制 ROC 曲线 [英] Plot ROC curve from Cross-Validation (training) data in R
问题描述
我想知道是否有办法根据 caret
包生成的 SVM-RFE 模型的交叉验证数据绘制平均 ROC 曲线.
I would like to know if there is a way to plot the average ROC Curve from the cross-validation data of a SVM-RFE model generated with the caret
package.
我的结果是:
Recursive feature selection
Outer resampling method: Cross-Validated (10 fold, repeated 5 times)
Resampling performance over subset size:
Variables ROC Sens Spec Accuracy Kappa ROCSD SensSD SpecSD AccuracySD KappaSD Selected
1 0.6911 0.0000 1.0000 0.5900 0.0000 0.2186 0.0000 0.0000 0.0303 0.0000
2 0.7600 0.3700 0.8067 0.6280 0.1807 0.1883 0.3182 0.2139 0.1464 0.3295
3 0.7267 0.4233 0.8667 0.6873 0.3012 0.2020 0.3216 0.1905 0.1516 0.3447
4 0.6989 0.3867 0.8600 0.6680 0.2551 0.2130 0.3184 0.1793 0.1458 0.3336
5 0.7000 0.3367 0.8600 0.6473 0.2006 0.2073 0.3359 0.1793 0.1588 0.3672
6 0.7167 0.3833 0.8200 0.6427 0.2105 0.1909 0.3338 0.2539 0.1682 0.3639
7 0.7122 0.3767 0.8333 0.6487 0.2169 0.1784 0.3226 0.2048 0.1642 0.3702
8 0.7144 0.4233 0.7933 0.6440 0.2218 0.2017 0.3454 0.2599 0.1766 0.3770
9 0.8356 0.6533 0.7867 0.7300 0.4363 0.1706 0.3415 0.2498 0.1997 0.4209
10 0.8811 0.6867 0.8200 0.7647 0.5065 0.1650 0.3134 0.2152 0.1949 0.4053 *
11 0.8700 0.6933 0.8133 0.7627 0.5046 0.1697 0.3183 0.2147 0.1971 0.4091
12 0.8678 0.6967 0.7733 0.7407 0.4682 0.1579 0.3153 0.2559
...
The top 5 variables (out of 10):
SumAverage_GLCM_R1SC4NG2, Variance_GLCM_R1SC4NG2, HGZE_GLSZM_R1SC4NG2, LGZE_GLSZM_R1SC4NG2, SZLGE_GLSZM_R1SC4NG2
我已经尝试过这里提到的解决方案:插入符号中训练数据的 ROC 曲线
I have tried with the solution mentioned here: ROC curve from training data in caret
optSize <- svmRFE_NG2$optsize
selectedIndices <- svmRFE_NG2$pred$Variables == optSize
plot.roc(svmRFE_NG2$pred$obs[selectedIndices],
svmRFE_NG2$pred$LUNG[selectedIndices])
但是这个解决方案似乎不起作用(产生的 AUC 值完全不同).我已经将训练过程的结果分成了 50 个交叉验证集,如上一个答案所述,但我不知道下一步该怎么做.
But this solution seems not to work (the resulting AUC value is quite different). I have separated the results of the training process into the 50 cross-validation sets, as mentioned in the previous answer, but I do not know what to do next.
resamples<-split(svmRFE_NG2$pred,svmRFE_NG2$pred$Variables)
resamplesFOLD<-split(resamples[[optSize]],resamples[[optSize]]$Resample)
有什么想法吗?
推荐答案
正如你所做的那样,你可以 a) 在 trainControl
参数中启用 savePredictions = T
>caret::train,然后,b) 从训练好的模型对象中,使用 pred
变量 - 包含对所有分区和重采样的所有预测 - 计算您想要的任何 ROC 曲线看.您现在可以选择多个 ROC,例如:
As you already did you can a) enable savePredictions = T
in the trainControl
parameter of caret::train
, then, b) from the trained model object, use the pred
variable - which contains all predictions over all partitions and resamples - to compute whichever ROC curve you would like to look at. You now have multiple options of which ROC this can be, e.g.:
您可以查看所有分区的所有预测并一次重新采样:
plot(roc(predictor = modelObject$pred$CLASSNAME, response = modelObject$pred$obs))
或者您可以通过单个分区和/或重新采样(这是您在上面尝试过的)来执行此操作.以下示例计算每个分区的 ROC 曲线和重新采样,因此 10 个分区和 5 次重复将产生 50 条 ROC 曲线:
Or you could do this over individual partitions and/or resamples (which is what you tried above). The following example computes the ROC curve per partition and resample, so with 10 partitions and 5 repeats will result in 50 ROC curves:
library(plyr)
l_ply(split(modelObject$pred, modelObject$pred$Resample), function(d) {
plot(roc(predictor = d$CLASSNAME, response = d$obs))
})
根据您的数据和模型,后者会在结果 ROC 曲线和 AUC 值中给您一定的差异.您可以在为您的各个分区和重新采样计算的 AUC
和 SD
值 caret
中看到相同的方差,因此这是您的数据和模型的结果并且是正确的.
Depending on your data and model, the latter will give you certain variance in the resulting ROC curves and AUC values. You can see the same variance in the AUC
and SD
values caret
calculated for your individual partitions and resamples, so this results from your data and model and is correct.
顺便说一句:我使用 pROC::roc
函数来计算上面的例子,但你可以在这里使用任何合适的函数.并且,在使用 caret::train
时,无论模型类型如何,获得的 ROC 始终相同.
BTW: I was using the pROC::roc
function for calculating the examples above, but you could use any suitable function here. And, when using caret::train
obtaining the ROC is always the same, no matter the model type.
这篇关于从 R 中的交叉验证(训练)数据绘制 ROC 曲线的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!