获得较低的 ROC AUC 分数但具有较高的准确性 [英] Getting a low ROC AUC score but a high accuracy

查看:80
本文介绍了获得较低的 ROC AUC 分数但具有较高的准确性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

航班延误数据集.

Using a LogisticRegression class in scikit-learn on a version of the flight delay dataset.

我使用 pandas 来选择一些列:

I use pandas to select some columns:

df = df[["MONTH", "DAY_OF_MONTH", "DAY_OF_WEEK", "ORIGIN", "DEST", "CRS_DEP_TIME", "ARR_DEL15"]]

我用 0 填充 NaN 值:

I fill in NaN values with 0:

df = df.fillna({'ARR_DEL15': 0})

确保分类列标有类别"数据类型:

Make sure the categorical columns are marked with the 'category' data type:

df["ORIGIN"] = df["ORIGIN"].astype('category')
df["DEST"] = df["DEST"].astype('category')

然后从pandas调用get_dummies():

df = pd.get_dummies(df)

现在我训练和测试我的数据集:

Now I train and test my data set:

from sklearn.linear_model import LogisticRegression
lr = LogisticRegression()

test_set, train_set = train_test_split(df, test_size=0.2, random_state=42)

train_set_x = train_set.drop('ARR_DEL15', axis=1)
train_set_y = train_set["ARR_DEL15"]

test_set_x = test_set.drop('ARR_DEL15', axis=1)
test_set_y = test_set["ARR_DEL15"]

lr.fit(train_set_x, train_set_y)

一旦我调用 score 方法,我就会得到大约 0.867.但是,当我调用 roc_auc_score 方法时,我得到的数字要低得多,约为 0.583

Once I call the score method I get around 0.867. However, when I call the roc_auc_score method I get a much lower number of around 0.583

 probabilities = lr.predict_proba(test_set_x)

 roc_auc_score(test_set_y, probabilities[:, 1])

ROC AUC 比 score 方法提供的值低很多有什么原因吗?

Is there any reason why the ROC AUC is much lower than what the score method provides?

推荐答案

首先,说 0.583 的 AUC较低";比 0.867 的分数* 就像比较苹果和橙子一样.

To start with, saying that an AUC of 0.583 is "lower" than a score* of 0.867 is exactly like comparing apples with oranges.

[* 我假设你的 score 是平均准确度,但这对本次讨论来说并不重要 - 原则上可以是其他任何东西]

[* I assume your score is mean accuracy, but this is not critical for this discussion - it could be anything else in principle]

至少根据我的经验,大多数机器学习从业者认为 AUC 分数衡量的是与实际所做的不同:常见的(不幸的)使用就像任何其他更高的 -更好的指标,如准确性,这可能会自然地导致像您表达自己那样的难题.

According to my experience at least, most ML practitioners think that the AUC score measures something different from what it actually does: the common (and unfortunate) use is just like any other the-higher-the-better metric, like accuracy, which may naturally lead to puzzles like the one you express yourself.

事实是,粗略地说,AUC 衡量的是二元分类器的性能在所有可能的决策阈值上取平均值.

The truth is that, roughly speaking, the AUC measures the performance of a binary classifier averaged across all possible decision thresholds.

(决定)阈值 在二元分类中是我们决定将样本标记为 1 的值(回想一下,概率分类器实际上在 [0, 1] 中返回一个值 p,通常解释为概率 - 在 scikit-learn 中,它是 predict_proba 返回的值).

The (decision) threshold in binary classification is the value above which we decide to label a sample as 1 (recall that probabilistic classifiers actually return a value p in [0, 1], usually interpreted as a probability - in scikit-learn it is what predict_proba returns).

现在,这个阈值,在像 scikit-learn predict 这样返回 labels (1/0) 的方法中,是 默认设置为 0.5,但这不是唯一的可能性,在某些情况下甚至可能不理想(例如,不平衡的数据).

Now, this threshold, in methods like scikit-learn predict which return labels (1/0), is set to 0.5 by default, but this is not the only possibility, and it may not even be desirable in come cases (imbalanced data, for example).

带回家的要点是:

  • 当你要求 score(在幕后 使用predict,即标签而不是概率),您还隐式将此阈值设置为0.5
  • 当您要求 AUC(相比之下,它使用由 predict_proba 返回的概率)时,不涉及阈值,并且您获得(类似)准确度平均跨越所有可能的阈值
  • when you ask for score (which under the hood uses predict, i.e. labels and not probabilities), you have also implicitly set this threshold to 0.5
  • when you ask for AUC (which, in contrast, uses probabilities returned with predict_proba), no threshold is involved, and you get (something like) the accuracy averaged across all possible thresholds

鉴于这些说明,您的特定示例提供了一个非常有趣的案例:

Given these clarifications, your particular example provides a very interesting case in point:

我的模型获得了足够好的准确率 ~ 87%;我是否应该关心,根据 0.58 的 AUC,我的分类器仅 稍微 比单纯的随机猜测好?

I get a good-enough accuracy ~ 87% with my model; should I care that, according to an AUC of 0.58, my classifier does only slightly better than mere random guessing?

假设您数据中的类表示合理平衡,那么现在答案应该是显而易见的:不,您不应该关心;对于所有实际情况,您关心的是使用特定阈值部署的分类器,并且该分类器在纯理论和抽象情况下在所有可能的阈值上进行平均时的作用应该对从业者(它确实引起了研究人员提出新算法的兴趣,但我认为这不是您的情况).

Provided that the class representation in your data is reasonably balanced, the answer by now should hopefully be obvious: no, you should not care; for all practical cases, what you care for is a classifier deployed with a specific threshold, and what this classifier does in a purely theoretical and abstract situation when averaged across all possible thresholds should pose very little interest for a practitioner (it does pose interest for a researcher coming up with a new algorithm, but I assume that this is not your case).

(对于不平衡的数据,参数会改变;这里的准确率实际上是没有用的,你应该考虑精确率、召回率和混淆矩阵).

(For imbalanced data, the argument changes; accuracy here is practically useless, and you should consider precision, recall, and the confusion matrix instead).

出于这个原因,AUC 开始在文献中受到严厉批评(不要误读 - ROC 曲线 本身的分析非常有用且有用);强烈推荐阅读维基百科条目及其中提供的参考资料:

For this reason, AUC has started receiving serious criticism in the literature (don't misread this - the analysis of the ROC curve itself is highly informative and useful); the Wikipedia entry and the references provided therein are highly recommended reading:

因此,AUC 度量的实用价值受到质疑,这增加了 AUC 实际上可能会在机器学习分类精度比较中引入比分辨率更多的不确定性的可能性.

Thus, the practical value of the AUC measure has been called into question, raising the possibility that the AUC may actually introduce more uncertainty into machine learning classification accuracy comparisons than resolution.

[...]

最近对 ROC AUC 问题的一种解释是,将 ROC 曲线减少到单个数字忽略了这样一个事实,即它是关于不同系统或绘制的性能点之间的权衡,而不是单个系统的性能

One recent explanation of the problem with ROC AUC is that reducing the ROC Curve to a single number ignores the fact that it is about the tradeoffs between the different systems or performance points plotted and not the performance of an individual system

强调我的 - 另请参阅关于 AUC 的危险...

Emphasis mine - see also On the dangers of AUC...

这篇关于获得较低的 ROC AUC 分数但具有较高的准确性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆