sklearn中的logistic回归模型如何获得对数似然? [英] how to get the log likelihood for a logistic regression model in sklearn?

查看:465
本文介绍了sklearn中的logistic回归模型如何获得对数似然?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在sklearn中使用logistic回归模型,并且我有兴趣检索这种模型的对数似然率,因此可以按照建议的

I'm using a logistic regression model in sklearn and I am interested in retrieving the log likelihood for such a model, so to perform an ordinary likelihood ratio test as suggested here.

该模型正在使用日志丢失作为计分规则.在文档中,对数丢失定义为在给定概率分类器预测的情况下,为真实标签的负对数可能性" .但是,该值始终为正,而对数似然度应为负.例如:

The model is using the log loss as scoring rule. In the documentation, the log loss is defined "as the negative log-likelihood of the true labels given a probabilistic classifier’s predictions". However, the value is always positive, whereas the log likelihood should be negative. As an example:

from sklearn.linear_model import LogisticRegression
from sklearn.metrics import log_loss

lr = LogisticRegression()
lr.fit(X_train, y_train)
y_prob = lr.predict_proba(X_test)
log_loss(y_test, y_prob)    # 0.66738

在该模型的文档中没有看到任何方法,我目前是否还没有发现其他可能性?

I do not see any method in the documentation for the model, is there any other possibility that I am currently not aware of?

推荐答案

仔细阅读;对数丢失是对数似然.正如您所说的对数似然确实是负数,因此它的负数将是正数.

Read closely; the log loss is the negative log-likelihood. Since log-likelihood is indeed as you say negative, its negative will be a positive number.

让我们看一个有关伪数据的示例:

Let's see an example with dummy data:

from sklearn.metrics import log_loss
import numpy as np

y_true = np.array([0, 1, 1])
y_pred = np.array([0.1, 0.2, 0.9])

log_loss(y_true, y_pred)
# 0.60671964791658428

现在,让我们使用链接到的scikit-learn文档中给出的公式,手动计算对数似然元素(即每个标签预测对一个值):

Now, let's compute manually the log-likelihood elements (i.e. one value per label-prediction pair), using the formula given in the scikit-learn docs you have linked to without the minus sign:

log_likelihood_elements = y_true*np.log(y_pred) + (1-y_true)*np.log(1-y_pred)
log_likelihood_elements
# array([-0.10536052, -1.60943791, -0.10536052])

现在,考虑到对数似然元素(确实为负),对数损失是其总和的负值,除以样本数:

Now, given the log-likelihood elements (which are indeed negative), the log loss is the negative of their sum, divided by the number of samples:

-np.sum(log_likelihood_elements)/len(y_true)
# 0.60671964791658428

log_loss(y_true, y_pred) == -np.sum(log_likelihood_elements)/len(y_true)
# True

这篇关于sklearn中的logistic回归模型如何获得对数似然?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆