不同系数:scikit-learn与statsmodels(逻辑回归) [英] Different coefficients: scikit-learn vs statsmodels (logistic regression)
问题描述
When running a logistic regression, the coefficients I get using statsmodels
are correct (verified them with some course material). However, I am unable to get the same coefficients with sklearn
. I've tried preprocessing the data to no avail. This is my code:
统计模型:
import statsmodels.api as sm
X_const = sm.add_constant(X)
model = sm.Logit(y, X_const)
results = model.fit()
print(results.summary())
相关输出为:
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
const -0.2382 3.983 -0.060 0.952 -8.045 7.569
a 2.0349 0.837 2.430 0.015 0.393 3.676
b 0.8077 0.823 0.981 0.327 -0.806 2.421
c 1.4572 0.768 1.897 0.058 -0.049 2.963
d -0.0522 0.063 -0.828 0.407 -0.176 0.071
e_2 0.9157 1.082 0.846 0.397 -1.205 3.037
e_3 2.0080 1.052 1.909 0.056 -0.054 4.070
Scikit学习(无需预处理)
from sklearn.linear_model import LogisticRegression
model = LogisticRegression()
results = model.fit(X, y)
print(results.coef_)
print(results.intercept_)
给出的系数是:
array([[ 1.29779008, 0.56524976, 0.97268593, -0.03762884, 0.33646097,
0.98020901]])
给出的截距/常数为:
array([ 0.0949539])
如您所见,无论哪个系数对应于哪个变量,sklearn
给出的数字都与statsmodels
中的正确数字不匹配.我想念什么?预先感谢!
As you can see, regardless of which coefficient corresponds to which variable, the numbers given by sklearn
don't match the correct ones from statsmodels
. What am I missing? Thanks in advance!
推荐答案
感谢正则化 sklearn
默认情况下适用于逻辑回归:
Thanks to a kind soul on reddit, this was solved. To get the same coefficients, one has to negate the regularisation that sklearn
applies to logistic regression by default:
model = LogisticRegression(C=1e8)
根据文档的位置是:
C:浮点型,默认值:1.0
C : float, default: 1.0
正则强度的倒数;必须为正浮点数.与支持向量机一样,较小的值表示更强的正则化.
Inverse of regularization strength; must be a positive float. Like in support vector machines, smaller values specify stronger regularization.
这篇关于不同系数:scikit-learn与statsmodels(逻辑回归)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!