ROC曲线图:0.50显着性和交叉验证 [英] ROC curve plot: 0.50 significant and cross-validation

查看:373
本文介绍了ROC曲线图:0.50显着性和交叉验证的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用pROC软件包绘制ROC曲线时遇到两个问题。

I have got two problems of using pROC package to plot the ROC curve.

A 。显着性水平或P值为事实上,当ROC曲线下的真实(人口)面积为0.5(零假设:Area = 0.5)时,找到ROC曲线下观察到的样本面积的概率。如果P小(P <0.05),则可以得出结论,ROC曲线下的面积显着不同于0.5,因此有证据表明实验室测试确实具有区分两组的能力。

A. The Significance level or P-value is the probability that the observed sample Area under the ROC curve is found when in fact, the true (population) Area under the ROC curve is 0.5 (null hypothesis: Area = 0.5). If P is small (P<0.05) then it can be concluded that the Area under the ROC curve is significantly different from 0.5 and that therefore there is evidence that the laboratory test does have an ability to distinguish between the two groups.

因此,我想计算ROC曲线下的某个区域是否与0.50显着不同。我发现使用pROC软件包的代码如下比较两个ROC曲线,但不确定如何测试它是否为0.5有效。

Therefore, I would like to calculate whether a certain area under the ROC curve differs from 0.50 significantly. I found the codes using pROC package to compare TWO ROC curves as follows, but not sure how to test if it is 0.5 significant.

library(pROC)  
data(aSAH)    

rocobj1 <- plot.roc(aSAH$outcome, aSAH$s100,  
                    main="Statistical comparison", 
                    percent=TRUE, col="#1c61b6")  

rocobj2 <- lines.roc(aSAH$outcome, aSAH$ndka, 
                     percent=TRUE, col="#008600")  

testobj <- roc.test(rocobj1, rocobj2)  
text(50, 50, 
     labels=paste("p-value =", format.pval(testobj$p.value)), 
     adj=c(0, .5))  

legend("bottomright", legend=c("S100B", "NDKA"), 
       col=c("#1c61b6", "#008600"), lwd=2)

B。我已经对我的分类问题进行了k折交叉验证。例如,5倍交叉验证将产生5条ROC曲线。然后如何使用pROC软件包绘制这5条ROC曲线的平均值(我想做的事情在此网页上进行了解释,但使用Python完成:在此处输入链接描述)?另一件事是,我们能否获得此平均ROC曲线的置信区间和最佳阈值(类似于下面实现的代码)?

B. I have done a k-fold cross-validation for my classification problem. For example, 5 fold cross-validation will produce 5 ROC curves. Then how to plot the average of these 5 ROC curves using pROC package (What I want to do is explained at this webpage but done in Python: enter link description here)? Another thing is can we get the confidence interval and the best threshold for this average ROC curve (something like the codes implemented below)?

    rocobj <- plot.roc(aSAH$outcome, aSAH$s100b,  
                       main="Confidence intervals", 
                       percent=TRUE,  ci=TRUE, # compute AUC (of AUC by default)  
                       print.auc=TRUE) # print the AUC (will contain the CI)  

    ciobj <- ci.se(rocobj, # CI of sensitivity  
                   specificities=seq(0, 100, 5)) # over a select set of specificities  
    plot(ciobj, type="shape", col="#1c61b6AA") # plot as a blue shape  
    plot(ci(rocobj, of="thresholds", thresholds="best")) # add one threshold

引用:

http://web.expasy.org/ pROC / screenshots.html

http://scikit-learn.org/0.13/auto_examples/plot_roc_crossval.html

http://www.talkstats.com/showthread.php/14487-ROC-重要性

http ://www.medcalc.org/manual/roc-curves.php

推荐答案

A。使用 wilcox.test 就是这样做的。

A. Use a wilcox.test which does exactly that.

B。请参阅我对这个问题的回答:功能选择+交叉验证,但是如何在R 中制作ROC曲线,并简单地将交叉验证的每一折叠中的数据连接起来(但是当您重复整个交叉时,请不要使用引导程序LOO进行此操作-多次验证,或无法在两次运行之间比较预测时。)

B. See my answer to this question: Feature selection + cross-validation, but how to make ROC-curves in R and simply concatenate the data in each fold of the cross-validation (but don't do that with bootstrap, LOO, when you repeat the whole cross-validation multiple times, or when the predictions can't be compared between run).

这篇关于ROC曲线图:0.50显着性和交叉验证的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆