scikit中的多类逻辑回归中的哪个系数可以学习? [英] which coefficients go to which class in multiclass logistic regression in scikit learn?

查看:77
本文介绍了scikit中的多类逻辑回归中的哪个系数可以学习?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用scikit Learn的Logistic回归解决多类问题.

I'm using scikit learn's Logistic Regression for a multiclass problem.

logit = LogisticRegression(penalty='l1')
logit = logit.fit(X, y)

我对决定这一决定的功能感兴趣.

I'm interested in which features are driving this decision.

logit.coef_

以上内容以(n_classes, n_features)格式为我提供了漂亮的数据框,但所有类和功能名称均已消失.有了功能,就可以了,因为假设它们被索引为与我传入它们时相同的方式似乎很安全...

The above gives me a beautiful dataframe in (n_classes, n_features) format, but all the classes and feature names are gone. With features, that's okay, because making the assumption that they're indexed the same way as I passed them in seems safe...

但是对于类来说,这是一个问题,因为我从来没有以任何顺序显式地传入类.那么系数集(数据帧中的行)0、1、2和3属于哪一类?

But with classes, it's a problem, since I never explicitly passed in the classes in any order. So which class do coefficient sets (rows in the dataframe) 0, 1, 2, and 3 belong to?

推荐答案

顺序将与logit.classes_返回的顺序相同(classes_是拟合模型的属性,代表y中存在的唯一类),并且大多数情况下如果是字符串,它们将按字母顺序排列.

The order will be same as returned by the logit.classes_ (classes_ is an attribute of the fitted model, which represents the unique classes present in y) and mostly they will be arranged alphabetically in case of strings.

为了说明这一点,我们在具有LogisticRegression的随机数据集上使用了上述标签y:

To explain it, we the above mentioned labels y on an random dataset with LogisticRegression:

import numpy as np
from sklearn.linear_model import LogisticRegression

X = np.random.rand(45,5)
y = np.array(['GR3', 'GR4', 'SHH', 'GR3', 'GR4', 'SHH', 'GR4', 'SHH',
              'GR4', 'WNT', 'GR3', 'GR4', 'GR3', 'SHH', 'SHH', 'GR3', 
              'GR4', 'SHH', 'GR4', 'GR3', 'SHH', 'GR3', 'SHH', 'GR4', 
              'SHH', 'GR3', 'GR4', 'GR4', 'SHH', 'GR4', 'SHH', 'GR4', 
              'GR3', 'GR3', 'WNT', 'SHH', 'GR4', 'SHH', 'SHH', 'GR3',
              'WNT', 'GR3', 'GR4', 'GR3', 'SHH'], dtype=object)

lr = LogisticRegression()
lr.fit(X,y)

# This is what you want
lr.classes_

#Out:
#    array(['GR3', 'GR4', 'SHH', 'WNT'], dtype=object)

lr.coef_
#Out:
#    array of shape [n_classes, n_features]

因此,在coef_矩阵中,行中的索引0表示'GR3'(classes_数组中的第一类,1 ='GR4',依此类推.

So in the coef_ matrix, the index 0 in rows represents the 'GR3' (the first class in classes_ array, 1 = 'GR4' and so on.

希望有帮助.

这篇关于scikit中的多类逻辑回归中的哪个系数可以学习?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆