如何使用r中的ROCR软件包绘制ROC曲线*仅带有分类列联表* [英] How to plot a ROC curve using ROCR package in r, *with only a classification contingency table*

查看:151
本文介绍了如何使用r中的ROCR软件包绘制ROC曲线*仅带有分类列联表*的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何在r中使用ROCR软件包绘制ROC曲线,仅具有分类列联表

How to plot a ROC curve using ROCR package in r, with only a classification contingency table?

我有列联表,其中可以计算出真阳性,假阳性等所有额定值。我有500个副本,因此有500个表。但是,我无法生成表示估计概率和真实性的每种情况的预测数据。没有单独的数据如何获得曲线。
下面是使用的打包指令。

I have a contingency table where the true positive, false positive.. etc. all the rated can be computed. I have 500 replications, therefore 500 tables. But, I can not generate a prediction data indicating each single case of estimating probability and the truth. How can I get a curve without the individual data. Below is the package instruction used.

## computing a simple ROC curve (x-axis: fpr, y-axis: tpr)
library(ROCR)
data(ROCR.simple)
pred <- prediction( ROCR.simple$predictions, ROCR.simple$labels)
perf <- performance(pred,"tpr","fpr")
plot(perf)    


推荐答案

您无法使用单个列联表生成完整的ROC曲线,因为列联表仅提供单个灵敏度/特异度对(对于用于生成列联表的任何预测截止点)。

You cannot generate the full ROC curve with a single contingency table because a contingency table provides only a single sensitivity/specificity pair (for whatever predictive cutoff was used to generate the contingency table).

如果您生成了许多具有不同截止值的列联表,则可以估算ROC曲线(基本上,它将是列联表中灵敏度/特异性值之间的线性插值)。例如,让我们考虑使用逻辑回归预测虹膜数据集中的花朵是否为杂色:

If you had many contingency tables that were generated with different cutoffs, you would be able to approximate the ROC curve (basically it will be a linear interpolation between the sensitivity/specificity values in your contingency tables). As an example, let's consider predicting whether a flower is versicolor in the iris dataset using logistic regression:

iris$isv <- as.numeric(iris$Species == "versicolor")
mod <- glm(isv~Sepal.Length+Sepal.Width, data=iris, family="binomial")

我们可以使用标准的 ROCR 代码来计算此模型的ROC曲线:

We could use the standard ROCR code to compute the ROC curve for this model:

library(ROCR)
pred1 <- prediction(predict(mod), iris$isv)
perf1 <- performance(pred1,"tpr","fpr")
plot(perf1)

现在让我们假设,除了 mod 以外,我们还有所有具有预测临界值的列联表:

Now let's assume that instead of mod all we have is contingency tables with a number of cutoffs values for predictions:

tables <- lapply(seq(0, 1, .1), function(x) table(iris$isv, factor(predict(mod, type="response") >= x, levels=c(F, T))))

# Predict TRUE if predicted probability at least 0
tables[[1]]
#     FALSE TRUE
#   0     0  100
#   1     0   50

# Predict TRUE if predicted probability at least 0.5
tables[[6]]
#     FALSE TRUE
#   0    86   14
#   1    29   21

# Predict TRUE if predicted probability at least 1
tables[[11]]
#     FALSE TRUE
#   0   100    0
#   1    50    0

由于截止值的增加,从一个表到下一个表的某些预测从TRUE变为FALSE,通过比较连续表的第1列,我们可以确定其中哪些代表真正的负面预测和错误的负面预测。通过迭代我们的列联表,我们可以创建假的预测值/结果对,并将其传递给ROCR,以确保我们匹配每个列表的敏感性/特异性。

From one table to the next some predictions changed from TRUE to FALSE due to the increased cutoff, and by comparing column 1 of the successive table we can determine which of these represent true negative and false negative predictions. Iterating through our ordered list of contingency tables we can create fake predicted value/outcome pairs that we can pass to ROCR, ensuring that we match the sensitivity/specificity for each contingency table.

fake.info <- do.call(rbind, lapply(1:(length(tables)-1), function(idx) {
  true.neg <- tables[[idx+1]][1,1] - tables[[idx]][1,1]
  false.neg <- tables[[idx+1]][2,1] - tables[[idx]][2,1]
  if (true.neg <= 0 & false.neg <= 0) {
    return(NULL)
  } else {
    return(data.frame(fake.pred=idx,
                      outcome=rep(c(0, 1), times=c(true.neg, false.neg))))
  }
}))

现在我们可以像往常一样将伪造的预测传递给ROCR了: / p>

Now we can pass the faked predictions to ROCR as usual:

pred2 <- prediction(fake.info$fake.pred, fake.info$outcome)
perf2 <- performance(pred2,"tpr","fpr")
plot(perf2)

基本上,我们所做的是对ROC曲线上的点进行线性插值。如果您有很多截止点的列联表,则可以更接近真实的ROC曲线。如果您没有很宽的分界线,那么您就无法希望准确地再现完整的ROC曲线。

Basically what we have done is a linear interpolation of the points that we do have on the ROC curve. If you had contingency tables for many cutoffs you could more closely approximate the true ROC curve. If you don't have a wide range of cutoffs you can't hope to accurately reproduce the full ROC curve.

这篇关于如何使用r中的ROCR软件包绘制ROC曲线*仅带有分类列联表*的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆