sklearn中的logistic回归模型如何获得对数似然? [英] how to get the log likelihood for a logistic regression model in sklearn?
问题描述
我正在sklearn中使用logistic回归模型,并且我有兴趣检索这种模型的对数似然率,因此可以按照建议的
I'm using a logistic regression model in sklearn and I am interested in retrieving the log likelihood for such a model, so to perform an ordinary likelihood ratio test as suggested here.
该模型正在使用日志丢失作为计分规则.在文档中,对数丢失定义为在给定概率分类器预测的情况下,为真实标签的负对数可能性" .但是,该值始终为正,而对数似然度应为负.例如:
The model is using the log loss as scoring rule. In the documentation, the log loss is defined "as the negative log-likelihood of the true labels given a probabilistic classifier’s predictions". However, the value is always positive, whereas the log likelihood should be negative. As an example:
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import log_loss
lr = LogisticRegression()
lr.fit(X_train, y_train)
y_prob = lr.predict_proba(X_test)
log_loss(y_test, y_prob) # 0.66738
在该模型的文档中没有看到任何方法,我目前是否还没有发现其他可能性?
I do not see any method in the documentation for the model, is there any other possibility that I am currently not aware of?
推荐答案
仔细阅读;对数丢失是负对数似然.正如您所说的对数似然确实是负数,因此它的负数将是正数.
Read closely; the log loss is the negative log-likelihood. Since log-likelihood is indeed as you say negative, its negative will be a positive number.
让我们看一个有关伪数据的示例:
Let's see an example with dummy data:
from sklearn.metrics import log_loss
import numpy as np
y_true = np.array([0, 1, 1])
y_pred = np.array([0.1, 0.2, 0.9])
log_loss(y_true, y_pred)
# 0.60671964791658428
现在,让我们使用链接到的scikit-learn文档中给出的公式,手动计算对数似然元素(即每个标签预测对一个值):
Now, let's compute manually the log-likelihood elements (i.e. one value per label-prediction pair), using the formula given in the scikit-learn docs you have linked to without the minus sign:
log_likelihood_elements = y_true*np.log(y_pred) + (1-y_true)*np.log(1-y_pred)
log_likelihood_elements
# array([-0.10536052, -1.60943791, -0.10536052])
现在,考虑到对数似然元素(确实为负),对数损失是其总和的负值,除以样本数:
Now, given the log-likelihood elements (which are indeed negative), the log loss is the negative of their sum, divided by the number of samples:
-np.sum(log_likelihood_elements)/len(y_true)
# 0.60671964791658428
log_loss(y_true, y_pred) == -np.sum(log_likelihood_elements)/len(y_true)
# True
这篇关于sklearn中的logistic回归模型如何获得对数似然?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!