sklearn“管道实例尚未安装."错误,即使它是 [英] sklearn "Pipeline instance is not fitted yet." error, even though it is

查看:91
本文介绍了sklearn“管道实例尚未安装."错误,即使它是的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

已经有人问过类似的问题,但答案并没有帮助我解决我的问题:管道中的Sklearn组件即使整个管道都没有安装?

A similar question is already asked, but the answer did not help me solve my problem: Sklearn components in pipeline is not fitted even if the whole pipeline is?

我正在尝试使用多个管道来预处理我的数据,其中一个热编码器用于分类和数字数据(如 这个博客).

I'm trying to use multiple pipelines to preprocess my data with a One Hot Encoder for categorical and numerical data (as suggested in this blog).

这是我的代码,尽管我的分类器产生了 78% 的准确率,但我无法弄清楚为什么我无法绘制我正在训练的决策树以及什么可以帮助我解决问题.这是代码片段:

Here is my code, and even though my classifier produces 78% accuracy, I can't figure out why I cannot plot the decision-tree I'm training and what can help me fix the problem. Here is the code snippet:

import pandas as pd
import sklearn.tree as tree
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder  
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler
from sklearn.compose import ColumnTransformer


X = pd.DataFrame(data=data)  
Y = pd.DataFrame(data=prediction)

categoricalFeatures = ["race", "gender"]
numericalFeatures = ["age", "number_of_actions"]

categoricalTransformer = Pipeline(steps=[
    ('onehot', OneHotEncoder(handle_unknown='ignore')),
])

numericTransformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='median')),
    ('scaler', StandardScaler()),
])

preprocessor = ColumnTransformer(transformers=[
    ('num', numericTransformer, numericalFeatures),
    ('cat', categoricalTransformer, categoricalFeatures)
])

classifier = Pipeline(steps=[
    ('preprocessor', preprocessor),
    ('classifier', tree.DecisionTreeClassifier(max_depth=3))
])

X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.2, random_state=12, stratify=Y)

classifier.fit(X_train, y_train)
print("model score: %.3f" % classifier.score(X_test, y_test))  # Prints accuracy of 0.78

text_representation = tree.export_text(classifier)

最后一个命令会产生这个错误,尽管模型被拟合了(我认为这是一个同步情况,但不知道如何解决它):

The last command produces this error, in spite of the model being fitted (I assume it's a synchronization situation but can't figure out how to solve it):

sklearn.exceptions.NotFittedError: This Pipeline instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator.

推荐答案

您不能在整个管道上使用 export_text 函数,因为它只接受决策树对象,即 DecisionTreeClassifierDecisionTreeRegressor.只有通过管道的拟合估计器,它才会起作用:

You cannot use the export_text function on the whole pipeline as it only accepts Decision Tree objects, i.e. DecisionTreeClassifier or DecisionTreeRegressor. Only pass the fitted estimator of your pipeline and it will work:

text_representation = tree.export_text(classifier['classifier'])

指出 Pipeline 对象未安装的错误消息是由于 check_is_fitted scikit-learn 的函数.它的工作原理是检查估计器上是否存在拟合属性(以尾随下划线结尾).由于 Pipeline 对象不公开此类属性,因此检查失败并引发错误,尽管它确实适合.但这不是问题,因为 Pipeline 对象无论如何都不应该以这种方式使用.

The error message stating that the Pipeline object is not fitted is due to the check_is_fitted function of scikit-learn. It works by checking the presence of fitted attributes (ending with a trailing underscore) on the estimator. Since Pipeline objects do not expose such attributes, the check fails and raises the error, although it is indeed fitted. But that is not a problem since Pipeline objects are not meant to be used that way anyway.

这篇关于sklearn“管道实例尚未安装."错误,即使它是的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆