多类分类:概率和校准 [英] Multiclass classification: probabilities and calibration

查看:115
本文介绍了多类分类:概率和校准的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在处理具有不同分类器的多类分类问题,并使用Python和scikit-learn.我想使用预测概率,基本上是为了比较特定情况下不同分类器的预测概率.

I'm working on a multiclass classification problem with different classifiers, working with Python and scikit-learn. I want to use the predicted probabilities, basically to compare the predicted probabilities of the different classifiers for a specific case.

我开始阅读有关校准"的信息,例如在 scikit-learn 出版物,我感到困惑.

I started reading about "calibration", for example at scikit-learn and a publication, and I became confused.

据我所知:一个经过良好校准的概率意味着该概率也反映了某个类别的分数.

For what I understood: a well-calibrated probability means that that a probability also reflects the fraction of a certain class.

  1. 这是否意味着如果我有10个均等分布的类,则理想情况下,每个类的校准概率约为0.1?

  1. Does this imply that if I have 10 equally distributed classes, the calibrated probabilities would ideally be around 0.1 for every class?

我是否可以将 predict_proba (未经校准)的概率解释为分类器如何确定这是正确的类"?

Can I interpret the probabilities of predict_proba (without calibration) as "how certain is the classifier about this being the correct class"?

希望有人可以为我澄清这一点!:)

Hopefully, someone can clarify this for me! :)

推荐答案

我了解您使用>"scikit-learn中的所有分类器都是开箱即用地进行多类分类.";

在这种情况下,如提到的

如果基本估计量支持多类预测,则CalibratedClassifierCV可以在多类设置中校准概率.[情况总是如此.]首先以一对多的方式分别针对每个类别对分类器进行校准.在预测概率时,将分别预测每个类别的校准概率.由于这些概率不一定总和为1,因此需要进行后处理以将其归一化.

CalibratedClassifierCV can calibrate probabilities in a multiclass setting if the base estimator supports multiclass predictions. [Which is always the case.] The classifier is calibrated first for each class separately in a one-vs-rest fashion. When predicting probabilities, the calibrated probabilities for each class are predicted separately. As those probabilities do not necessarily sum to one, a postprocessing is performed to normalize them.

我希望这能回答您的第一个问题.

I hope this answers your first question.

回答第二个问题:是的,这是在之前和在 predict_proba 校准之后的想法.但是,在校准之后, predict_proba 的结果实际上是正确的,而在此之前它们是正确的.

To answer your second question: Yes, this is the idea, before and after calibration for predict_proba. However, after calibration the results of predict_proba are actually right, while before they are just so-so correct.

之后:

确切地说,我并没有试图以您的面值回答您的第一个问题.您在那里询问了每个班级的概率.但是,由于我们在谈论校准,因此您必须考虑 predict_proba 给出的是每个样本的输出,而不是每个类.我认为您的意思是每个样本,否则您应该指定:您是指所有样本的平均概率吗?

To be precise, I did not try to answer your first question at face value. There you asked regarding probability for each class. However, since we are talking about calibration, you have to consider that predict_proba is giving an output per sample, not per class. I think you mean per sample, otherwise you should specify: Do you mean the average probability over all samples?

这篇关于多类分类:概率和校准的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆