即使整个管道都安装了,管道中的 Sklearn 组件也没有安装? [英] Sklearn components in pipeline is not fitted even if the whole pipeline is?

查看:76
本文介绍了即使整个管道都安装了,管道中的 Sklearn 组件也没有安装?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图从安装好的管道中挑出一个组件/变压器来检查它的行为.但是,当我检索组件时,该组件显示为未安装,但是将管道作为一个整体使用是没有问题的.这表明管道已安装,组件也已安装.

I'm trying to single out a component/transformer from a fitted pipeline to inspect it's behavior. However, when I retrieved the component, the component is showed as unfitted, but using the pipeline as a whole works without problem. This suggest the pipeline is fitted and the components are fitted as well.

有人可以解释原因,并建议如何检查装配好的管道中的组件吗?

Can someone explain why, and also suggest how to inspect a component in a fitted pipeline?

这是一个可重现的例子:

Here's a reproducible example:

import pandas as pd
import numpy as np

from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split, GridSearchCV

np.random.seed(0)

# Read data from Titanic dataset.
titanic_url = ('https://raw.githubusercontent.com/amueller/'
               'scipy-2017-sklearn/091d371/notebooks/datasets/titanic3.csv')
data = pd.read_csv(titanic_url)

# We create the preprocessing pipelines for both numeric and categorical data.
numeric_features = ['age', 'fare']
numeric_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='median')),
    ('scaler', StandardScaler())])

categorical_features = ['embarked', 'sex', 'pclass']
categorical_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='constant', fill_value='missing')),
    ('onehot', OneHotEncoder(handle_unknown='ignore'))])

preprocessor = ColumnTransformer(
    transformers=[
        ('num', numeric_transformer, numeric_features),
        ('cat', categorical_transformer, categorical_features)])

# Append classifier to preprocessing pipeline.
# Now we have a full prediction pipeline.
clf = Pipeline(steps=[('preprocessor', preprocessor),
                      ('classifier', LogisticRegression(solver='lbfgs'))])

X = data.drop('survived', axis=1)
y = data['survived']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

clf.fit(X_train, y_train)
print("model score: %.3f" % clf.score(X_test, y_test))

调用:

clf.get_params()['preprocessor__cat__imputer'].transform(X)

clf.named_steps['preprocessor'].transformers[0][1].named_steps['imputer'].transform(X)

会导致这样的错误:

NotFittedError: This SimpleImputer instance is not fitted yet. Call 'fit' with appropriate arguments before using this method.

推荐答案

ColumnTransformer 属性 transformers 是输入 unfitted 转换器.要访问拟合的转换器,请使用属性 transformers_named_transformers_.我想 get_params()['preprocessor__cat__imputer'] 也得到了未安装的输入转换器.

The ColumnTransformer attribute transformers is the input unfitted transformers. To access the fitted transformers, use the attribute transformers_ or named_transformers_. I suppose get_params()['preprocessor__cat__imputer'] is also getting the unfitted input transformer.

(您仍然会收到错误消息,因为输入器也会尝试处理字符串数据,而 strategy='median' 将失败.)

(You'll still get an error, because the imputer will try to work on the string data as well, and strategy='median' will fail.)

这篇关于即使整个管道都安装了,管道中的 Sklearn 组件也没有安装?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆