变压器包含嵌入式管道时如何从ELI5获取功能名称 [英] How to get feature names from ELI5 when transformer includes an embedded pipeline

查看:99
本文介绍了变压器包含嵌入式管道时如何从ELI5获取功能名称的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

ELI5 库提供功能transform_feature_names来检索功能名称,以输出 sklearn 转换器. 文档指出,该功能在转换器包含嵌套管道时显示框.

The ELI5 library provides the function transform_feature_names to retrieve the feature names for the output of an sklearn transformer. The documentation says that the function works out of the box when the transformer includes nested Pipelines.

我试图在对SO 57528350的答复中使该函数在示例的简化版本上运行.我的简化示例不需要Pipeline,但是在现实生活中,我需要它以便将步骤添加到categorical_transformer,并且我还想将转换器添加到ColumnTransformer.

I'm trying to get the function to work on a simplified version of the example in the answer to SO 57528350. My simplified example doesn't need Pipeline, but in real life I will need it in order to add steps to categorical_transformer, and I will also want to add transformers to the ColumnTransformer.

import eli5
import pandas as pd
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import OneHotEncoder, StandardScaler

X_train = pd.DataFrame({'age': [23, 12, 12, 18],
                        'gender': ['M', 'F', 'F', 'F'],
                        'income': ['high', 'low', 'low', 'medium'],
                        'y': [0, 1, 1, 1]})

categorical_features = ['gender', 'income']
categorical_transformer = Pipeline(
    steps=[('onehot', OneHotEncoder(handle_unknown='ignore'))])

transformers=[('categorical', categorical_transformer, categorical_features)]
preprocessor = ColumnTransformer(transformers)
X_train_transformed = preprocessor.fit(X_train)

eli5.transform_feature_names(preprocessor, list(X_train.columns))

此消息消失

AttributeError:类别为Transformer的变压器(类型为Pipeline)不提供get_feature_names.

AttributeError: Transformer categorical (type Pipeline) does not provide get_feature_names.

由于Pipeline嵌套在ColumnTransformer中,因此我从ELI5文档中了解到可以对其进行处理.

Since the Pipeline is nested in the ColumnTransformer, I understood from the ELI5 documentation that it would be handled.

我是否需要使用get_feature_names方法创建Pipeline的修改版本或进行其他自定义修改,以便利用ELI5功能?

Do I need to create a modified version of Pipeline with a get_feature_names method or make other custom modifications in order to take advantage of the ELI5 function?

我正在使用python 3.7.6,eli5 0.10.1,pandas 0.25.3和sklearn 0.22.1.

I'm using python 3.7.6, eli5 0.10.1, pandas 0.25.3, and sklearn 0.22.1.

推荐答案

我认为问题在于eli5依赖ColumnTransformer方法get_feature_names,该方法本身要求Pipelineget_feature_names尚未在sklearn中实现.

I think the problem is that eli5 is relying on the ColumnTransformer method get_feature_names, which itself asks the Pipeline to get_feature_names, which is not yet implemented in sklearn.

我在您的示例中打开了eli5问题.

一个可能的解决方案:为ColumnTransformer添加transform_feature_names调度;这可能只是对其现有get_feature_names的修改,以为其每个组件转换器调用eli5 transform_feature_names(而不是sklearn自己的get_feature_names).尽管我不确定input_names与训练数据框列(在ColumnTransformer中以_df_columns可用)不同时如何处理,但以下内容似乎可行.

One possible fix: adding a transform_feature_names dispatch for ColumnTransformer; this can be just a modification of its existing get_feature_names to call for eli5 transform_feature_names for each of its component transformers (instead of sklearn's own get_feature_names). The below seems to work, although I'm not sure how to handle when input_names differs from the training dataframe columns, available in the ColumnTransformer as _df_columns.

from eli5 import transform_feature_names

@transform_feature_names.register(ColumnTransformer)
def col_tfm_names(transformer, in_names=None):
    if in_names is None:
        from eli5.sklearn.utils import get_feature_names
        # generate default feature names
        in_names = get_feature_names(transformer, num_features=transformer._n_features)
    # return a list of strings derived from in_names
    feature_names = []
    for name, trans, column, _ in transformer._iter(fitted=True):
        if hasattr(transformer, '_df_columns'):
            if ((not isinstance(column, slice))
                    and all(isinstance(col, str) for col in column)):
                names = column
            else:
                names = transformer._df_columns[column]
        else:
            indices = np.arange(transformer._n_features)
            names = ['x%d' % i for i in indices[column]]
        # erm, want to be able to override with in_names maybe???

        if trans == 'drop' or (
                hasattr(column, '__len__') and not len(column)):
            continue
        if trans == 'passthrough':
            feature_names.extend(names)
            continue
        feature_names.extend([name + "__" + f for f in
                              transform_feature_names(trans, in_names=names)])
    return feature_names

我还需要为OneHotEncoder创建一个调度,因为它的get_feature_names需要参数input_features:

I also needed to create a dispatch for OneHotEncoder, because its get_feature_names needs the parameter input_features:

@transform_feature_names.register(OneHotEncoder)
def _ohe_names(est, in_names=None):
    return est.get_feature_names(input_features=in_names)

相关链接:
https://eli5.readthedocs.io/en/latest/autodocs/eli5.html#eli5.transform_feature_names
https://github.com/TeamHG-Memex/eli5/blob/4839d1927c4a68aeff051935d1d4d8a4fb69b46d/eli5/sklearn/transform.py

Relevant links:
https://eli5.readthedocs.io/en/latest/autodocs/eli5.html#eli5.transform_feature_names
https://github.com/TeamHG-Memex/eli5/blob/4839d1927c4a68aeff051935d1d4d8a4fb69b46d/eli5/sklearn/transform.py

这篇关于变压器包含嵌入式管道时如何从ELI5获取功能名称的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆