F1比分ROC AUC [英] F1 Score vs ROC AUC

查看:174
本文介绍了F1比分ROC AUC的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在2种不同的情况下,我的F1和AUC得分都低于

I have the below F1 and AUC scores for 2 different cases

模型1:精度:85.11召回率:99.04 F1:91.55 AUC:69.94

Model 1: Precision: 85.11 Recall: 99.04 F1: 91.55 AUC: 69.94

模型2:精度:85.1召回率:98.73 F1:91.41 AUC:71.69

Model 2: Precision: 85.1 Recall: 98.73 F1: 91.41 AUC: 71.69

我的问题的主要动机是正确预测阳性病例,即减少假阴性病例(FN).我应该使用F1分数并选择模型1还是使用AUC并选择模型2.谢谢

The main motive of my problem to predict the positive cases correctly,ie, reduce the False Negative cases (FN). Should I use F1 score and choose Model 1 or use AUC and choose Model 2. Thanks

推荐答案

简介

作为经验法则,每次您要比较 ROC AUC F1得分时,都应该考虑一下,就像在基于以下条件比较模型性能一样:

Introduction

As a rule of thumb, every time you want to compare ROC AUC vs F1 Score, think about it as if you are comparing your model performance based on:

[Sensitivity vs (1-Specificity)] VS [Precision vs Recall]

现在,我们需要了解什么:敏感性,特异性,精确度和直观地回忆

Now we need to understand what are: Sensitivity, Specificity, Precision and Recall intuitively!

灵敏度:由以下公式给出:

从直觉上讲,如果我们有一个100%敏感的模型,这意味着它没有遗漏任何真阳性",换句话说,有个假阴性(,即标记为否定的阳性结果).但是存在很多误报的风险!

Intuitively speaking, if we have a 100% sensitive model, that means it did NOT miss any True Positive, in other words, there were NO False Negatives (i.e. a positive result that is labeled as negative). But there is a risk of having a lot of False Positives!

特异性:由以下公式给出:

从直觉上讲,如果我们有100%特定的模型,这意味着它没有遗漏任何True Negative,换句话说,存在 NO 误报(即标记为阳性的阴性结果).但是存在很多假阴性的风险!

Intuitively speaking, if we have 100% specific model, that means it did NOT miss any True Negative, in other words, there were NO False Positives (i.e. negative result that is labeled as positive). But there is a risk of having a lot of False Negatives!

精度:由以下公式给出:

Precision: is given by the following formula:

从直觉上讲,如果我们有一个100%精确的模型,则意味着它可以捕获全部真阳性,但是存在假阳性.

Intuitively speaking, if we have a 100% precise model, that means it could catch all True positive but there were NO False Positive.

召回:由以下公式给出:

从直觉上讲,如果我们有100%的召回率模型,则表示它没有错过任何真实肯定",换句话说,有个假否定"(,即标记为否定的阳性结果).

Intuitively speaking, if we have a 100% recall model, that means it did NOT miss any True Positive, in other words, there were NO False Negatives (i.e. a positive result that is labeled as negative).

如您所见,这四个概念非常接近!

As you can see, the four concepts are very close to each other!

根据经验,如果拥有False negative的成本很高,我们希望提高模型的敏感性和召回率(与公式完全相同)!.

As a rule of thumb, if the cost of having False negative is high, we want to increase the model sensitivity and recall (which are the exact same in regard to their formula)!.

例如,在欺诈检测或患病患者检测中,我们不想将欺诈交易(真阳性)标记/预测为非欺诈(假阴性).同样,我们也不想将传染性疾病患者(真阳性)标记/预测为未患病(假阴性).

For instance, in fraud detection or sick patient detection, we don't want to label/predict a fraudulent transaction (True Positive) as non-fraudulent (False Negative). Also, we don't want to label/predict a contagious sick patient (True Positive) as not sick (False Negative).

这是因为后果要比误报更糟(错误地将无害交易标记为欺诈或将非传染性患者标记为具有传染性).

This is because the consequences will be worse than a False Positive (incorrectly labelling a a harmless transaction as fraudulent or a non-contagious patient as contagious).

另一方面,如果拥有误报的成本很高,那么我们想提高模型的特异性和准确性!

On the other hand, if the cost of having False Positive is high, then we want to increase the model specificity and precision!.

例如,在电子邮件垃圾邮件检测中,我们不想将非垃圾邮件(真阴性)标记/预测为垃圾邮件(假阳性).另一方面,将垃圾邮件标记为垃圾邮件(错误否定)的成本较低.

For instance, in email spam detection, we don't want to label/predict a non-spam email (True Negative) as spam (False Positive). On the other hand, failing to label a spam email as spam (False Negative) is less costly.

由以下公式给出:

F1得分在精确度"和召回率"之间保持平衡.如果类分布不均匀,我们会使用它,因为精度和召回率可能会产生误导性的结果!

F1 Score keeps a balance between Precision and Recall. We use it if there is uneven class distribution, as precision and recall may give misleading results!

因此,我们将F1得分用作精确度"和召回率"之间的比较指标!

So we use F1 Score as a comparison indicator between Precision and Recall Numbers!

它比较灵敏度与(1-Specific),换句话说,比较正阳性率与误阳性率.

It compares the Sensitivity vs (1-Specificity), in other words, compare the True Positive Rate vs False Positive Rate.

因此,AUROC越大,真肯定"和真否定"之间的区别就越大!

So, the bigger the AUROC, the greater the distinction between True Positives and True Negatives!

通常,ROC适用于许多不同级别的阈值,因此它具有许多F得分值. F1分数适用于ROC曲线上的任何特定点.

In general, the ROC is for many different levels of thresholds and thus it has many F score values. F1 score is applicable for any particular point on the ROC curve.

您可能会认为它是精度和在特定阈值下的召回率的量度,而AUC是ROC曲线下的面积.要使F得分高,准确度和召回率都应该高.

You may think of it as a measure of precision and recall at a particular threshold value whereas AUC is the area under the ROC curve. For F score to be high, both precision and recall should be high.

因此,当您在正样本和负样本之间有数据不平衡时,您应该始终使用F1评分,因为ROC在所有样本中均平均值可能的阈值!

Consequently, when you have a data imbalance between positive and negative samples, you should always use F1-score because ROC averages over all possible thresholds!

进一步阅读:

信用卡欺诈:高度处理不平衡等级以及为什么不应该使用接收器工作特性曲线(ROC曲线),在高度不平衡的情况下应首选精确度/召回率"曲线

我故意使用术语SensitivityRecall,尽管它们是完全相同的,只是为了强调这样一个事实,按照惯例,作为ML工程师,我们更可能使用术语Recall,而统计学家则更可能使用用术语Sensitivity表示相同的精确量度.

I intentionally used both terms Sensitivity and Recall, although they are the exact same, just to emphasize on the fact that by convention as ML Engineers we more likely use the term Recall, whereas Statisticians would more likely use the term Sensitivity to refer to the same exact measure.

这篇关于F1比分ROC AUC的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆