F1 分数 vs ROC AUC [英] F1 Score vs ROC AUC

查看:45
本文介绍了F1 分数 vs ROC AUC的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下两种不同情况的 F1 和 AUC 分数

I have the below F1 and AUC scores for 2 different cases

模型 1:精度:85.11 召回:99.04 F1:91.55 AUC:69.94

Model 1: Precision: 85.11 Recall: 99.04 F1: 91.55 AUC: 69.94

模型 2:精度:85.1 召回:98.73 F1:91.41 AUC:71.69

Model 2: Precision: 85.1 Recall: 98.73 F1: 91.41 AUC: 71.69

我的问题的主要动机是正确预测正例,即减少假负例 (FN).我应该使用 F1 分数并选择模型 1 还是使用 AUC 并选择模型 2.谢谢

The main motive of my problem to predict the positive cases correctly,ie, reduce the False Negative cases (FN). Should I use F1 score and choose Model 1 or use AUC and choose Model 2. Thanks

推荐答案

简介

根据经验,每次您想比较 ROC AUCF1 分数 时,请考虑一下,就好像您在比较模型性能的基础上:

Introduction

As a rule of thumb, every time you want to compare ROC AUC vs F1 Score, think about it as if you are comparing your model performance based on:

[Sensitivity vs (1-Specificity)] VS [Precision vs Recall]

请注意,灵敏度是召回率(它们是完全相同的指标).

现在我们需要了解什么是:特异性、精确度和召回率(灵敏度)直观

Now we need to understand what are: Specificity, Precision and Recall (Sensitivity) intuitively!

特异性:由以下公式给出:

直观地说,如果我们有 100% 特定的模型,这意味着它没有遗漏了任何真阴性,换句话说,没有没有假阳性(即被错误标记为阳性的阴性结果).然而,存在大量误报的风险!

Intuitively speaking, if we have 100% specific model, that means it did NOT miss any True Negative, in other words, there were NO False Positives (i.e. negative result that is falsely labeled as positive). Yet, there is a risk of having a lot of False Negatives!

精度:由以下公式给出:

直观地说,如果我们有一个 100% 精确的模型,这意味着它可以捕获所有真阳性,但没有假阳性.

Intuitively speaking, if we have a 100% precise model, that means it could catch all True positive but there were NO False Positive.

回忆:由以下公式给出:

直观地说,如果我们有一个 100% 召回率模型,这意味着它没有遗漏了任何真阳性,换句话说,没有没有假阴性(即被错误标记为阴性的阳性结果).然而,存在大量误报的风险!

Intuitively speaking, if we have a 100% recall model, that means it did NOT miss any True Positive, in other words, there were NO False Negatives (i.e. a positive result that is falsely labeled as negative). Yet, there is a risk of having a lot of False Positives!

如您所见,这三个概念非常接近!

As you can see, the three concepts are very close to each other!

根据经验,如果出现假阴性的成本很高,我们希望提高模型的灵敏度和召回率(它们的公式完全相同)!.

As a rule of thumb, if the cost of having False negative is high, we want to increase the model sensitivity and recall (which are the exact same in regard to their formula)!.

例如,在欺诈检测或病人检测中,我们不想将欺诈交易(真阳性)标记为/预测为非欺诈(假阴性).此外,我们不想将具有传染性的病人(真阳性)标记/预测为未生病(假阴性).

For instance, in fraud detection or sick patient detection, we don't want to label/predict a fraudulent transaction (True Positive) as non-fraudulent (False Negative). Also, we don't want to label/predict a contagious sick patient (True Positive) as not sick (False Negative).

这是因为其后果将比误报(错误地将无害交易标记为欺诈或将非传染性患者标记为具有传染性)更糟糕.

This is because the consequences will be worse than a False Positive (incorrectly labeling a a harmless transaction as fraudulent or a non-contagious patient as contagious).

另一方面,如果 False Positive 的成本很高,那么我们要提高模型的特异性和精度!

On the other hand, if the cost of having False Positive is high, then we want to increase the model specificity and precision!.

例如,在垃圾邮件检测中,我们不想将非垃圾邮件(真阴性)标记/预测为垃圾邮件(假阳性).另一方面,未能将垃圾邮件标记为垃圾邮件(误报)成本较低.

For instance, in email spam detection, we don't want to label/predict a non-spam email (True Negative) as spam (False Positive). On the other hand, failing to label a spam email as spam (False Negative) is less costly.

由以下公式给出:

F1 Score 在精确度和召回率之间保持平衡.如果类别分布不均匀,我们会使用它,因为精度和召回率可能会产生误导性结果!

F1 Score keeps a balance between Precision and Recall. We use it if there is uneven class distribution, as precision and recall may give misleading results!

所以我们使用 F1 Score 作为 Precision 和 Recall Numbers 之间的比较指标!

So we use F1 Score as a comparison indicator between Precision and Recall Numbers!

它比较了 Sensitivity vs (1-Specificity),换句话说,比较了 True Positive Rate 和 False Positive Rate.

It compares the Sensitivity vs (1-Specificity), in other words, compare the True Positive Rate vs False Positive Rate.

因此,AUROC 越大,真阳性和真阴性之间的区别就越大!

So, the bigger the AUROC, the greater the distinction between True Positives and True Negatives!

一般来说,ROC 适用于许多不同级别的阈值,因此它有许多 F 分值.F1 分数适用于 ROC 曲线上的任何特定点.

In general, the ROC is for many different levels of thresholds and thus it has many F score values. F1 score is applicable for any particular point on the ROC curve.

您可以将其视为特定阈值下的准确率和召回率的度量,而 AUC 是 ROC 曲线下的面积.F得分高,准确率和召回率都应该高.

You may think of it as a measure of precision and recall at a particular threshold value whereas AUC is the area under the ROC curve. For F score to be high, both precision and recall should be high.

因此,当您在正样本和负样本之间存在数据不平衡时,您应该始终使用 F1-score,因为 ROC 平均可能的阈值!

Consequently, when you have a data imbalance between positive and negative samples, you should always use F1-score because ROC averages over all possible thresholds!

进一步阅读:

信用卡欺诈:高度处理不平衡类别以及为什么不应使用接收器操作特性曲线(ROC 曲线),并且在高度不平衡的情况下应首选精度/召回曲线

这篇关于F1 分数 vs ROC AUC的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆