scikit中的多类逻辑回归中的哪个系数可以学习? [英] which coefficients go to which class in multiclass logistic regression in scikit learn?
问题描述
我正在使用scikit Learn的Logistic回归解决多类问题.
I'm using scikit learn's Logistic Regression for a multiclass problem.
logit = LogisticRegression(penalty='l1')
logit = logit.fit(X, y)
我对决定这一决定的功能感兴趣.
I'm interested in which features are driving this decision.
logit.coef_
以上内容以(n_classes, n_features)
格式为我提供了漂亮的数据框,但所有类和功能名称均已消失.有了功能,就可以了,因为假设它们被索引为与我传入它们时相同的方式似乎很安全...
The above gives me a beautiful dataframe in (n_classes, n_features)
format, but all the classes and feature names are gone. With features, that's okay, because making the assumption that they're indexed the same way as I passed them in seems safe...
但是对于类来说,这是一个问题,因为我从来没有以任何顺序显式地传入类.那么系数集(数据帧中的行)0、1、2和3属于哪一类?
But with classes, it's a problem, since I never explicitly passed in the classes in any order. So which class do coefficient sets (rows in the dataframe) 0, 1, 2, and 3 belong to?
推荐答案
顺序将与logit.classes_
返回的顺序相同(classes_是拟合模型的属性,代表y中存在的唯一类),并且大多数情况下如果是字符串,它们将按字母顺序排列.
The order will be same as returned by the logit.classes_
(classes_ is an attribute of the fitted model, which represents the unique classes present in y) and mostly they will be arranged alphabetically in case of strings.
为了说明这一点,我们在具有LogisticRegression的随机数据集上使用了上述标签y:
To explain it, we the above mentioned labels y on an random dataset with LogisticRegression:
import numpy as np
from sklearn.linear_model import LogisticRegression
X = np.random.rand(45,5)
y = np.array(['GR3', 'GR4', 'SHH', 'GR3', 'GR4', 'SHH', 'GR4', 'SHH',
'GR4', 'WNT', 'GR3', 'GR4', 'GR3', 'SHH', 'SHH', 'GR3',
'GR4', 'SHH', 'GR4', 'GR3', 'SHH', 'GR3', 'SHH', 'GR4',
'SHH', 'GR3', 'GR4', 'GR4', 'SHH', 'GR4', 'SHH', 'GR4',
'GR3', 'GR3', 'WNT', 'SHH', 'GR4', 'SHH', 'SHH', 'GR3',
'WNT', 'GR3', 'GR4', 'GR3', 'SHH'], dtype=object)
lr = LogisticRegression()
lr.fit(X,y)
# This is what you want
lr.classes_
#Out:
# array(['GR3', 'GR4', 'SHH', 'WNT'], dtype=object)
lr.coef_
#Out:
# array of shape [n_classes, n_features]
因此,在coef_
矩阵中,行中的索引0表示'GR3'(classes_
数组中的第一类,1 ='GR4',依此类推.
So in the coef_
matrix, the index 0 in rows represents the 'GR3' (the first class in classes_
array, 1 = 'GR4' and so on.
希望有帮助.
这篇关于scikit中的多类逻辑回归中的哪个系数可以学习?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!