不同系数:scikit-learn与statsmodels(逻辑回归) [英] Different coefficients: scikit-learn vs statsmodels (logistic regression)

查看:897
本文介绍了不同系数:scikit-learn与statsmodels(逻辑回归)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

运行逻辑回归时,我使用

When running a logistic regression, the coefficients I get using statsmodels are correct (verified them with some course material). However, I am unable to get the same coefficients with sklearn. I've tried preprocessing the data to no avail. This is my code:

统计模型:

import statsmodels.api as sm

X_const = sm.add_constant(X)
model = sm.Logit(y, X_const)
results = model.fit()
print(results.summary())

相关输出为:

                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const      -0.2382      3.983     -0.060      0.952      -8.045       7.569
a           2.0349      0.837      2.430      0.015       0.393       3.676
b           0.8077      0.823      0.981      0.327      -0.806       2.421
c           1.4572      0.768      1.897      0.058      -0.049       2.963
d          -0.0522      0.063     -0.828      0.407      -0.176       0.071
e_2         0.9157      1.082      0.846      0.397      -1.205       3.037
e_3         2.0080      1.052      1.909      0.056      -0.054       4.070

Scikit学习(无需预处理)

from sklearn.linear_model import LogisticRegression

model = LogisticRegression()
results = model.fit(X, y)
print(results.coef_)
print(results.intercept_)

给出的系数是:

array([[ 1.29779008,  0.56524976,  0.97268593, -0.03762884,  0.33646097,
     0.98020901]])

给出的截距/常数为:

array([ 0.0949539])

如您所见,无论哪个系数对应于哪个变量,sklearn给出的数字都与statsmodels中的正确数字不匹配.我想念什么?预先感谢!

As you can see, regardless of which coefficient corresponds to which variable, the numbers given by sklearn don't match the correct ones from statsmodels. What am I missing? Thanks in advance!

推荐答案

感谢正则化 sklearn默认情况下适用于逻辑回归:

Thanks to a kind soul on reddit, this was solved. To get the same coefficients, one has to negate the regularisation that sklearn applies to logistic regression by default:

model = LogisticRegression(C=1e8)

根据文档的位置是:

C:浮点型,默认值:1.0

C : float, default: 1.0

正则强度的倒数;必须为正浮点数.与支持向量机一样,较小的值表示更强的正则化.

Inverse of regularization strength; must be a positive float. Like in support vector machines, smaller values specify stronger regularization.

这篇关于不同系数:scikit-learn与statsmodels(逻辑回归)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆