在分类中，测试准确性和AUC分数之间有什么区别? [英] In Classification, what is the difference between the test accuracy and the AUC score?

查看：166 发布时间：2021/4/22 19:09:08 machine-learning classification auc

本文介绍了在分类中，测试准确性和AUC分数之间有什么区别?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在研究一个基于分类的项目，并且我正在根据它们的训练准确性，测试准确性，混淆矩阵和AUC分数来评估不同的ML模型.我现在仍然无法理解通过计算测试集上的ML模型(X_test)的准确性而获得的得分与AUC得分之间的区别.

如果我是正确的话，那么这两个指标都会计算出ML模型能够很好地预测以前看不见的数据的正确类别.我也明白，只要模型不是过拟合或欠拟合，数字都越大越好.

假设ML模型既不是过度拟合也不是拟合不足，那么测试准确性得分和AUC得分之间有什么区别?

我没有数学和统计背景，而是从商业背景转向数据科学.因此，我将感谢业务人员可以理解的解释.

解决方案

这两个术语都量化了分类模型的质量，但是，准确性量化了变量的单个表示形式，这意味着它描述了单个

1)精度是对单个混淆矩阵的度量，并定义为:

其中tp = true阳性，tn = true阴性，fp = false阳性，fn = false阴性(每一个的数量).

2) AUC 测量ROC(接收器工作特性)下的区域，即 true-positive-rate 和 false-positive-rate .对于每个假阳性率(fpr)阈值的选择，确定真阳性率(tpr).即，对于给定的分类器，fpr为0、0.1、0.2，因此接受第四，并且对于每个fpr，其依赖的tpr都将进行评估.因此，您将获得一个函数 tpr(fpr)，该函数将时间间隔[0,1]映射到相同的时间间隔，因为这两个速率均在这些时间间隔中定义.这条线下面的区域称为AUC，介于0和1之间，从而可以通过随机分类得出0.5的AUC.

AUC，因为它是曲线下的面积，所以定义为:

但是，在实际(和有限的)应用程序中， ROC 是一个阶跃函数，并且AUC由这些级别的加权总和确定.

图形来自 Borgelt的智能数据挖掘讲座.

I am working on a classification-based project, and I am evaluating different ML models based on their training accuracy, testing accuracy, confusion matrix, and the AUC score. I am now stuck in understanding the difference between the scores I get by calculating accuracy of a ML model on the test set (X_test), and the AUC score.

If I am correct, both metrics calculate how well a ML model is able to predict the correct class of previously unseen data. I also understand that for both, the higher the number, the better, for as long as the model is not over-fit or under-fit.

Assuming a ML model is neither over-fit nor under-fit, what is the difference between test accuracy score and the AUC score?

I don't have a background in math and stats, and pivoted towards data science from business background. Therefore, I will appreciate an explanation a business person can understand.

解决方案

Both terms quantify the quality of a classification model, however, the accuracy quantifies a single manifestation of the variables, which means it describes a single confusion matrix. The AUC (area under the curve) represents the trade-off between the true-positive-rate (tpr) and the false-positive-rate (fpr) in multiple confusion matrices, that are generated for different fpr values for the same classifier. A confusion matrix is of the form:

1) The accuracy is a measure for a single confusion matrix and is defined as:

where tp=true-positives, tn=true-negatives, fp=false-positives and fn=false-negatives (the amount of each).

2) The AUC measures the area under the ROC (receiver operating characteristic), that is the trade-off curve between the true-positive-rate and the false-positive-rate. For each choice of the false-positive-rate (fpr) threshold,the true-positive-rate (tpr) is determined. I.e, for a given classifier a fpr of 0, 0.1, 0.2 and so fourth is accepted, and for each fpr it's dependent tpr is evaluated. Therefore, you get a function tpr(fpr) that maps the interval [0,1] onto the same interval, because both rates are defined in those intervals. The area under this line is called the AUC, that is between 0 and 1, whereby a random classification is expected to yield an AUC of 0.5.

The AUC, as it is the area under the curve, is defined as:

However, in real (and finite) applications, the ROC is a step function and the AUC is determined by a weighted sum these levels.

Graphics are from Borgelt's Intelligent Data Mining Lecture.

这篇关于在分类中，测试准确性和AUC分数之间有什么区别?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在分类中，测试准确性和AUC分数之间有什么区别? [英] In Classification, what is the difference between the test accuracy and the AUC score?

问题描述

相关文章

AI人工智能最新文章

热门教程

热门工具

登录关闭

在分类中，测试准确性和AUC分数之间有什么区别? [英] In Classification, what is the difference between the test accuracy and the AUC score?

问题描述

相关文章

AI人工智能最新文章

热门教程

热门工具

登录 关闭

登录关闭