Python SKLearn:逻辑回归概率 [英] Python SKLearn: Logistic Regression Probabilities
问题描述
我正在使用Python SKLearn模块执行逻辑回归.我有一个因变量矢量Y
(从M个类中的1个取值)和独立变量矩阵X
(具有N个特征).我的代码是
I am using the Python SKLearn module to perform logistic regression. I have a dependent variable vector Y
(taking values from 1 of M classes) and independent variable matrix X
(with N features). My code is
LR = LogisticRegression()
LR.fit(X,np.resize(Y,(len(Y))))
我的问题是,LR.coef_
和LR.intercept_
代表什么.我最初以为它们保留值intercept(i)
和coef(i,j)
s.t.
My question is, what does LR.coef_
and LR.intercept_
represent. I initially thought they held the values intercept(i)
and coef(i,j)
s.t.
log(p(1)/(1-p(1))) = intercept(1) + coef(1,1)*X1 + ... coef(1,N)*XN
.
.
.
log(p(M)/(1-p(M))) = intercept(M) + coef(M,1)*X1 + ... coef(M,N)*XN
其中,p(i)
是具有特征[X1, ... ,XN]
的观察在类i
中的概率.但是当我尝试转换时
where p(i)
is the probability that observation with features [X1, ... ,XN]
is in class i
. However when I try to convert
V = X*LR.coef_.transpose()
U = V + LR.intercept_
A = np.exp(U)
A/(1+A)
,因此A
是p(1) ... p(M)
中观察值的p(1) ... p(M)
矩阵.该值应与
so that A
is the matrix of p(1) ... p(M)
for the observations in X
. This should be the same value as
LR.predict_proba(X)
但是它们很接近,但又有所不同.为什么是这样?
however they are close, but different. Why is this?
推荐答案
coef_
和intercept_
属性表示您的想法,因为您忘记了归一化,所以概率计算已关闭:
The coef_
and intercept_
attributes represent what you think, your probability calculations are off because you forgot to normalize: after
P = A / (1 + A)
你应该做
P /= P.sum(axis=1).reshape((-1, 1))
再现 scikit-learn算法.
这篇关于Python SKLearn:逻辑回归概率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!