sklearn 的 precision_recall_curve 在小例子上不正确 [英] sklearn's precision_recall_curve incorrect on small example

查看：57 发布时间：2021/7/16 20:14:38 scikit-learn precision-recall

本文介绍了sklearn 的 precision_recall_curve 在小例子上不正确的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

这是一个使用 precision_recall_curve() 的非常小的例子:

Here is a very small example using precision_recall_curve():

from sklearn.metrics import precision_recall_curve, precision_score, recall_score
y_true = [0, 1]
y_predict_proba = [0.25,0.75]
precision, recall, thresholds = precision_recall_curve(y_true, y_predict_proba)
precision, recall

导致:

(array([1., 1.]), array([1., 0.]))

以上与后面的手动"计算不符.

The above does not match the "manual" calculation which follows.

取决于阈值，存在三种可能的类向量:[0,0](当阈值 > 0.75 时)、[0,1](当阈值在 0.25 和 0.75 之间时)和 [1,1](当阈值＜0.25).我们必须丢弃 [0,0] 因为它给出了未定义的精度(除以零).因此，将precision_score() 和recall_score() 应用于其他两个:

There are three possible class vectors depending on threshold: [0,0] (when the threshold is > 0.75) , [0,1] (when the threshold is between 0.25 and 0.75), and [1,1] (when the threshold is <0.25). We have to discard [0,0] because it gives an undefined precision (divide by zero). So, applying precision_score() and recall_score() to the other two:

y_predict_class=[0,1]
precision_score(y_true, y_predict_class), recall_score(y_true, y_predict_class)

给出:

(1.0, 1.0)

和

y_predict_class=[1,1]
precision_score(y_true, y_predict_class), recall_score(y_true, y_predict_class)

给出

(0.5, 1.0)

这似乎与 precision_recall_curve() 的输出不匹配(例如，它没有产生 0.5 的精度值).

This seems not to match the output of precision_recall_curve() (which for example did not produce a 0.5 precision value).

我错过了什么吗?

推荐答案

我知道我迟到了，但我也有同样的疑问，我最终解决了.这里的要点是，precision_recall_curve() 在第一次获得完全召回后不再输出精度和召回值；此外，它将一个 0 连接到 recall 数组，将一个 1 连接到 precision 数组，以便让曲线从对应于 y 轴的位置开始.

I know I am late, but I had your same doubt that I have eventually solved. The main point here is that precision_recall_curve() does not output precision and recall values anymore after full recall is obtained the first time; moreover, it concatenates a 0 to the recall array and a 1 to the precision array so as to let the curve start in correspondence of the y-axis.

在您的具体示例中，您将有效地完成两个这样的数组(由于 sklearn 的特定实现，它们的顺序相反):

In your specific example, you'll have effectively two arrays done like this (they are ordered the other way around because of the specific implementation of sklearn):

precision, recall
(array([1., 0.5]), array([1., 1.]))

然后，与第二次完全召回相对应的两个数组的值被省略，1 和 0 值(分别用于精度和召回)按上述方式连接:

Then, the values of the two arrays which do correspond to the second occurrence of full recall are omitted and 1 and 0 values (for precision and recall, respectively) are concatenated as described above:

precision, recall
(array([1., 1.]), array([1., 0.]))

我已尝试在此处完整地解释它细节;另一个有用的链接当然是这个.

I have tried to explain it here in full details; another useful link is certainly this one.

这篇关于sklearn 的 precision_recall_curve 在小例子上不正确的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

sklearn 的 precision_recall_curve 在小例子上不正确 [英] sklearn's precision_recall_curve incorrect on small example

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

sklearn 的 precision_recall_curve 在小例子上不正确 [英] sklearn&#39;s precision_recall_curve incorrect on small example

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

sklearn 的 precision_recall_curve 在小例子上不正确 [英] sklearn's precision_recall_curve incorrect on small example

登录关闭