在 scikit-learn 中为 Brier Score Loss 创建评分器 [英] Creating scorer for Brier Score Loss in scikit-learn

查看:62
本文介绍了在 scikit-learn 中为 Brier Score Loss 创建评分器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在 scikit-learn (0.16.1) 中使用 GridSearchCV 和 RandomizedSearchCV 进行逻辑回归和随机森林分类器(可能还有其他分类器)来解决二元类问题.我设法让 GridSearchCV 与标准 LogisticRegression 分类器一起工作,但我无法让 LogisticRegressionCV(或 RandomForestClassifier 的 RandomizedGridCV)与自定义评分函数一起工作,特别是 brier_score_loss.我试过这个代码:

I'm trying to make use of GridSearchCV and RandomizedSearchCV in scikit-learn (0.16.1) for logistic regression and random forest classifiers (and possibly others down the road) for binary class problems. I managed to get GridSearchCV to work with the standard LogisticRegression classifier, but I cannot get LogisticRegressionCV to work (or RandomizedGridCV for the RandomForestClassifier) with a customized scoring function, specifically brier_score_loss. I have tried this code:

lrcv = LogisticRegressionCV(scoring = make_scorer(brier_score_loss, greater_is_better=False, needs_proba=True, needs_threshold=False, pos_label=1))
lrcv_clf = lrcv.fit(X=X_train,y=y_train)

但我不断收到错误,这些错误实际上是说 brier_score_loss 函数正在接收具有 2 列的输入 (y_prob),从而导致错误(错误的输入形状).有没有办法指定只使用y_prob(lrcv.predict_proba)的第二列,这样Brier分数就可以这样计算了?我认为 pos_label 可能会有所帮助,但显然没有.我是否需要避免 make_scorer 而只是创建我自己的评分函数?

But I keep getting errors that are essentially saying the brier_score_loss function is receiving input (y_prob) with 2 columns, causing an error (bad input shape). Is there a way to specify to use only the second column of y_prob (lrcv.predict_proba) so that the Brier score can be calculated in this way? I thought pos_label might help but apparently not. Do I need to avoid make_scorer and just create my own scoring function?

感谢您的建议!

推荐答案

predict_proba 为每个预测的 y 值返回两个概率,第一个约为 0 第二个是关于 1.您应该选择您需要的一个并将其进一步传递给评分函数.
我正在使用简单的代理功能执行此操作:

predict_proba returns two probabilities for every predicted y value, the first is about 0 and the second is about 1. You should choose which one you need and pass it further to the scoring function.
I'm doing this with the simple proxy function:

def ProbaScoreProxy(y_true, y_probs, class_idx, proxied_func, **kwargs):
    return proxied_func(y_true, y_probs[:, class_idx], **kwargs)

可以这样使用:

scorer = metrics.make_scorer(ProbaScoreProxy, greater_is_better=False, needs_proba=True, class_idx=1, proxied_func=metrics.brier_score_loss)

对于二元分类,class_idx 可以是 0 或 1.

For the binary classification the class_idx can be 0 or 1.

这篇关于在 scikit-learn 中为 Brier Score Loss 创建评分器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆