sklearn 的 precision_recall_curve 在小例子上不正确 [英] sklearn's precision_recall_curve incorrect on small example

查看:57
本文介绍了sklearn 的 precision_recall_curve 在小例子上不正确的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是一个使用 precision_recall_curve() 的非常小的例子:

Here is a very small example using precision_recall_curve():

from sklearn.metrics import precision_recall_curve, precision_score, recall_score
y_true = [0, 1]
y_predict_proba = [0.25,0.75]
precision, recall, thresholds = precision_recall_curve(y_true, y_predict_proba)
precision, recall

导致:

(array([1., 1.]), array([1., 0.]))

以上与后面的手动"计算不符.

The above does not match the "manual" calculation which follows.

取决于阈值,存在三种可能的类向量:[0,0](当阈值 > 0.75 时)、[0,1](当阈值在 0.25 和 0.75 之间时)和 [1,1](当阈值<0.25).我们必须丢弃 [0,0] 因为它给出了未定义的精度(除以零).因此,将precision_score() 和recall_score() 应用于其他两个:

There are three possible class vectors depending on threshold: [0,0] (when the threshold is > 0.75) , [0,1] (when the threshold is between 0.25 and 0.75), and [1,1] (when the threshold is <0.25). We have to discard [0,0] because it gives an undefined precision (divide by zero). So, applying precision_score() and recall_score() to the other two:

y_predict_class=[0,1]
precision_score(y_true, y_predict_class), recall_score(y_true, y_predict_class)

给出:

(1.0, 1.0)

y_predict_class=[1,1]
precision_score(y_true, y_predict_class), recall_score(y_true, y_predict_class)

给出

(0.5, 1.0)

这似乎与 precision_recall_curve() 的输出不匹配(例如,它没有产生 0.5 的精度值).

This seems not to match the output of precision_recall_curve() (which for example did not produce a 0.5 precision value).

我错过了什么吗?

推荐答案

我知道我迟到了,但我也有同样的疑问,我最终解决了.这里的要点是,precision_recall_curve() 在第一次获得完全召回后不再输出精度和召回值;此外,它将一个 0 连接到 recall 数组,将一个 1 连接到 precision 数组,以便让曲线从对应于 y 轴的位置开始.

I know I am late, but I had your same doubt that I have eventually solved. The main point here is that precision_recall_curve() does not output precision and recall values anymore after full recall is obtained the first time; moreover, it concatenates a 0 to the recall array and a 1 to the precision array so as to let the curve start in correspondence of the y-axis.

在您的具体示例中,您将有效地完成两个这样的数组(由于 sklearn 的特定实现,它们的顺序相反):

In your specific example, you'll have effectively two arrays done like this (they are ordered the other way around because of the specific implementation of sklearn):

precision, recall
(array([1., 0.5]), array([1., 1.]))

然后,与第二次完全召回相对应的两个数组的值被省略,1 和 0 值(分别用于精度和召回)按上述方式连接:

Then, the values of the two arrays which do correspond to the second occurrence of full recall are omitted and 1 and 0 values (for precision and recall, respectively) are concatenated as described above:

precision, recall
(array([1., 1.]), array([1., 0.]))

我已尝试在此处完整地解释它细节;另一个有用的链接当然是这个.

I have tried to explain it here in full details; another useful link is certainly this one.

这篇关于sklearn 的 precision_recall_curve 在小例子上不正确的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆