eli5:具有两个标签的show_weights() [英] eli5: show_weights() with two labels
问题描述
我正在按顺序尝试 eli5 了解术语对某些类别的预测的贡献.
I'm trying eli5 in order to understand the contribution of terms to the prediction of certain classes.
您可以运行以下脚本:
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
from sklearn.datasets import fetch_20newsgroups
#categories = ['alt.atheism', 'soc.religion.christian']
categories = ['alt.atheism', 'soc.religion.christian', 'comp.graphics']
np.random.seed(1)
train = fetch_20newsgroups(subset='train', categories=categories, shuffle=True, random_state=7)
test = fetch_20newsgroups(subset='test', categories=categories, shuffle=True, random_state=7)
bow_model = CountVectorizer(stop_words='english')
clf = LogisticRegression()
pipel = Pipeline([('bow', bow),
('classifier', clf)])
pipel.fit(train.data, train.target)
import eli5
eli5.show_weights(clf, vec=bow, top=20)
问题:
不幸的是,当使用两个标签时,输出仅限于一个表:
When working with two labels, the output is unfortunately limited to only one table:
categories = ['alt.atheism', 'soc.religion.christian']
但是,当使用三个标签时,它也会输出三个表.
However, when using three labels, it also outputs three tables.
categories = ['alt.atheism', 'soc.religion.christian', 'comp.graphics']
是软件中的错误,它在第一个输出中错过了y = 0还是我错过了一个统计点?我希望在第一种情况下可以看到两个表格.
Is it a bug in the software that it misses y=0 in the first output, or do I miss a statistical point? I would expect to see two tables for the first case.
推荐答案
这与eli5无关,而是与scikit-learn(在本例中为LogisticRegression()
)如何对待两个类别有关.对于只有两个类别,问题变成了二进制类别,因此从学习到的分类器中到处都只返回一列属性.
This has not to do with eli5 but with how scikit-learn (in this case LogisticRegression()
) treats two categories. For only two categories, the problem turns into a binary one, so only a single column of attributes is returned everywhere from learned classifier.
查看LogisticRegression的属性:
Look at the attributes of LogisticRegression:
coef_:数组,形状为(1,n_features)或(n_classes,n_features)
coef_ : array, shape (1, n_features) or (n_classes, n_features)
Coefficient of the features in the decision function.
coef_ is of shape (1, n_features) when the given problem is binary.
intercept_:数组,形状为(1,)或(n_classes,)
intercept_ : array, shape (1,) or (n_classes,)
Intercept (a.k.a. bias) added to the decision function.
If fit_intercept is set to False, the intercept is set to zero.
intercept_ is of shape(1,) when the problem is binary.
coef_
为二进制时的形状为(1, n_features)
. coef_
由eli5.show_weights()
使用.
coef_
is of shape (1, n_features)
when binary. This coef_
is used by the eli5.show_weights()
.
希望这很清楚.
这篇关于eli5:具有两个标签的show_weights()的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!