eli5:具有两个标签的show_weights() [英] eli5: show_weights() with two labels

查看:986
本文介绍了eli5:具有两个标签的show_weights()的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在按顺序尝试 eli5 了解术语对某些类别的预测的贡献.

I'm trying eli5 in order to understand the contribution of terms to the prediction of certain classes.

您可以运行以下脚本:

import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
from sklearn.datasets import fetch_20newsgroups

#categories = ['alt.atheism', 'soc.religion.christian']
categories = ['alt.atheism', 'soc.religion.christian', 'comp.graphics']

np.random.seed(1)
train = fetch_20newsgroups(subset='train', categories=categories, shuffle=True, random_state=7)
test = fetch_20newsgroups(subset='test', categories=categories, shuffle=True, random_state=7)

bow_model = CountVectorizer(stop_words='english')
clf = LogisticRegression()
pipel = Pipeline([('bow', bow),
                 ('classifier', clf)])

pipel.fit(train.data, train.target)

import eli5
eli5.show_weights(clf, vec=bow, top=20)

问题:

不幸的是,当使用两个标签时,输出仅限于一个表:

When working with two labels, the output is unfortunately limited to only one table:

categories = ['alt.atheism', 'soc.religion.christian']

但是,当使用三个标签时,它也会输出三个表.

However, when using three labels, it also outputs three tables.

categories = ['alt.atheism', 'soc.religion.christian', 'comp.graphics']

是软件中的错误,它在第一个输出中错过了y = 0还是我错过了一个统计点?我希望在第一种情况下可以看到两个表格.

Is it a bug in the software that it misses y=0 in the first output, or do I miss a statistical point? I would expect to see two tables for the first case.

推荐答案

这与eli5无关,而是与scikit-learn(在本例中为LogisticRegression())如何对待两个类别有关.对于只有两个类别,问题变成了二进制类别,因此从学习到的分类器中到处都只返回一列属性.

This has not to do with eli5 but with how scikit-learn (in this case LogisticRegression()) treats two categories. For only two categories, the problem turns into a binary one, so only a single column of attributes is returned everywhere from learned classifier.

查看LogisticRegression的属性:

Look at the attributes of LogisticRegression:

coef_:数组,形状为(1,n_features)或(n_classes,n_features)

coef_ : array, shape (1, n_features) or (n_classes, n_features)

Coefficient of the features in the decision function.
coef_ is of shape (1, n_features) when the given problem is binary.

intercept_:数组,形状为(1,)或(n_classes,)

intercept_ : array, shape (1,) or (n_classes,)

Intercept (a.k.a. bias) added to the decision function.

If fit_intercept is set to False, the intercept is set to zero.
intercept_ is of shape(1,) when the problem is binary.

coef_为二进制时的形状为(1, n_features). coef_eli5.show_weights()使用.

coef_ is of shape (1, n_features) when binary. This coef_ is used by the eli5.show_weights().

希望这很清楚.

这篇关于eli5:具有两个标签的show_weights()的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆