如何使用scikit-learn评估预测的置信度得分 [英] how to assess the confidence score of a prediction with scikit-learn

查看:158
本文介绍了如何使用scikit-learn评估预测的置信度得分的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经编写了一个简单的代码,该代码带有一个参数"query_seq",其他方法将计算描述符,最后可以使用"LogisticRegression"(或该功能提供的任何其他算法)算法将预测设为"0(负)".对于给定的情况)"或1(对于给定的情况为正)"

I have write down a simple code that takes One arguments "query_seq", further methods calculates descriptor and in the end predictions can be made using "LogisticRegression" (or any other algorithm provided with the function) algorithms as "0 (negative for given case)" or "1 (positive for given case)"

def main_process(query_Seq):
    LR = LogisticRegression()
    GNB = GaussianNB()
    KNB = KNeighborsClassifier()
    DT = DecisionTreeClassifier()
    SV = SVC(probability=True)

    train_x, train_y,train_l = data_gen(p) 
    a  = DC_CLASS()
    test_x = a.main_p(query_Seq)
    return Prediction(train_x, train_y, test_x,LR)

在执行交叉验证时,我们已经为算法的准确性估计计算了不同的统计参数(特异性,灵敏度,mmc等).现在我的问题是,scikit-learn中是否有任何方法可以用来估计测试数据预测的置信度得分.

While we performed cross validation we have calculated the different statistical parameters for the accuracy estimation (specificity, sensitivity, mmc, etc. ) for an algorithm. Now my Question is that, is there any method in scikit-learn through which we can estimate the confidence score for a test data prediction.

推荐答案

许多分类器可以通过调用 predict_proba 而不是来提示您对给定预测的置信度预测方法.阅读此方法的文档字符串,以了解其返回的numpy数组的内容.

Many classifiers can give you a hint of their own confidence level for a given prediction by calling the predict_proba instead of the predict method. Read the docstring of this method to understand the content of the numpy array it returns.

但是请注意,分类器在估计自己的置信度时也会出错.要解决此问题,您可以使用外部校准程序通过保留数据(使用交叉验证循环)校准分类器.该文档将为您提供有关校准的更多详细信息:

Note however that classifiers can also make mistakes in estimating their own confidence level. To fix this you can use an external calibration procedure to calibrate the classifier via held out data (using a cross-validation loop). The documentation will give you more details on calibration:

http://scikit-learn.org/stable/modules/calibration.html

最后请注意,默认情况下, LogisticRegression 给出了经过良好校准的置信度.其他大多数模型类别都可以从外部校准中受益.

Finally note that LogisticRegression gives reasonably well calibrated confidence levels by default. Most other model class to benefit from external calibration.

这篇关于如何使用scikit-learn评估预测的置信度得分的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆