在Pipeline sklearn(Python)中使用多个自定义类 [英] Using multiple custom classes with Pipeline sklearn (Python)

查看：70 发布时间：2021/5/31 18:35:58 python pandas machine-learning scikit-learn pipeline

本文介绍了在Pipeline sklearn(Python)中使用多个自定义类的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我尝试为学生制作有关流水线的教程，但我阻止了.我不是专家，但我正在努力提高.因此，感谢您的放纵.实际上，我尝试在管道中执行一些步骤来为分类器准备数据帧:

I try to do a tutorial on Pipeline for students but I block. I'm not an expert but I'm trying to improve. So thank you for your indulgence. In fact, I try in a pipeline to execute several steps in preparing a dataframe for a classifier:

第1步:数据框说明
第2步:填写NaN值
第3步:将分类值转换为数字

这是我的代码:

class Descr_df(object):

    def transform (self, X):
        print ("Structure of the data: \n {}".format(X.head(5)))
        print ("Features names: \n {}".format(X.columns))
        print ("Target: \n {}".format(X.columns[0]))
        print ("Shape of the data: \n {}".format(X.shape))

    def fit(self, X, y=None):
        return self

class Fillna(object):

    def transform(self, X):
        non_numerics_columns = X.columns.difference(X._get_numeric_data().columns)
        for column in X.columns:
            if column in non_numerics_columns:
                X[column] = X[column].fillna(df[column].value_counts().idxmax())
            else:
                 X[column] = X[column].fillna(X[column].mean())            
        return X

    def fit(self, X,y=None):
        return self

class Categorical_to_numerical(object):

    def transform(self, X):
        non_numerics_columns = X.columns.difference(X._get_numeric_data().columns)
        le = LabelEncoder()
        for column in non_numerics_columns:
            X[column] = X[column].fillna(X[column].value_counts().idxmax())
            le.fit(X[column])
            X[column] = le.transform(X[column]).astype(int)
        return X

    def fit(self, X, y=None):
        return self

如果我执行步骤1和2或步骤1和3，则可以，但是如果我同时执行步骤1、2和3.我有这个错误:

If I execute step 1 and 2 or step 1 and 3 it works but if I execute step 1, 2 and 3 at the same time. I have this error:

pipeline = Pipeline([('df_intropesction', Descr_df()), ('fillna',Fillna()), ('Categorical_to_numerical', Categorical_to_numerical())])
pipeline.fit(X, y)
AttributeError: 'NoneType' object has no attribute 'columns'

推荐答案

之所以会出现此错误，是因为在管道中，第一个估算器的输出转到第二个，然后第二个估算器的输出转到第三个，依此类推...

This error arises because in the Pipeline the output of first estimator goes to the second, then the output of second estimator goes to third and so on...

来自文档管道:

一次又一次地拟合所有变换并变换数据，然后使用最终估算器拟合转换后的数据.

Fit all the transforms one after the other and transform the data, then fit the transformed data using the final estimator.

因此对于您的管道，执行步骤如下:

So for your pipeline, the steps of execution are following:

Descr_df.fit(X)->不执行任何操作并返回self
newX = Descr_df.transform(X)->应该返回一些值以分配给应传递给下一个估计器的newX，但是您的定义不返回任何值(仅打印).所以没有隐式返回
Fillna.fit(newX)->不执行任何操作并返回self
Fillna.transform(newX)->调用newX.columns.但是newX =步骤2中没有.因此是错误.

解决方案:更改Descr_df的转换方法以按原样返回数据框:

Solution: Change the transform method of Descr_df to return the dataframe as it is:

def transform (self, X):
    print ("Structure of the data: \n {}".format(X.head(5)))
    print ("Features names: \n {}".format(X.columns))
    print ("Target: \n {}".format(X.columns[0]))
    print ("Shape of the data: \n {}".format(X.shape))
    return X

建议:让您的类继承自 scikit 中的 Base Estimator 和 Transformer 类，以确认良好做法.

Suggestion : Make your classes inherit from Base Estimator and Transformer classes in scikit to confirm to the good practice.

即将 Class Descr_df(object)更改为 Descr_df(BaseEstimator，TransformerMixin)， Fillna(object)更改为 Fillna(BaseEstimator，TransformerMixin)等.

i.e change the class Descr_df(object) to class Descr_df(BaseEstimator, TransformerMixin), Fillna(object) to Fillna(BaseEstimator, TransformerMixin) and so on.

有关管道中自定义类的更多详细信息，请参阅此示例:

See this example for more details on custom classes in Pipeline:

>://scikit-learn.org/stable/auto_examples/hetero_feature_union.html#sphx-glr-auto-examples-hetero-feature-union-py

这篇关于在Pipeline sklearn(Python)中使用多个自定义类的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在Pipeline sklearn(Python)中使用多个自定义类 [英] Using multiple custom classes with Pipeline sklearn (Python)

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录关闭

在Pipeline sklearn(Python)中使用多个自定义类 [英] Using multiple custom classes with Pipeline sklearn (Python)

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录 关闭

登录关闭