用于 sklearn 管道的自定义转换器，可同时更改 X 和 y [英] Custom transformer for sklearn Pipeline that alters both X and y

查看：20 发布时间：2021/12/25 14:23:41 python pandas numpy machine-learning scikit-learn

本文介绍了用于 sklearn 管道的自定义转换器，可同时更改 X 和 y的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想创建自己的转换器以与 sklearn Pipeline 一起使用.

I want to create my own transformer for use with the sklearn Pipeline.

我正在创建一个实现 fit 和 transform 方法的类.转换器的目的是从矩阵中删除超过指定数量的 NaN 的行.

I am creating a class that implements both fit and transform methods. The purpose of the transformer will be to remove rows from the matrix that have more than a specified number of NaNs.

我面临的问题是如何更改传递给转换器的 X 和 y 矩阵?

我相信这必须在 fit 方法中完成，因为它可以访问 X 和 y.由于一旦我将 X 重新分配给行数较少的新矩阵，python 就会通过赋值传递参数，因此对原始 X 的引用丢失了(当然，y 也是如此).是否可以维护此引用?

I believe this has to be done in the fit method since it has access to both X and y. Since python passes arguments by assignment once I reassign X to a new matrix with fewer rows the reference to the original X is lost (and of course the same is true for y). Is it possible to maintain this reference?

我正在使用 Pandas DataFrame 轻松删除具有过多 NaN 的行，这对于我的用例来说可能不是正确的方法.当前代码如下所示:

I’m using a pandas DataFrame to easily drop the rows that have too many NaNs, this may not be the right way to do it for my use case. The current code looks like this:

class Dropna():

    # thresh is max number of NaNs allowed in a row
    def __init__(self, thresh=0):
        self.thresh = thresh

    def fit(self, X, y):
        total = X.shape[1]
        # +1 to account for 'y' being added to the dframe                                                                                                                            
        new_thresh = total + 1 - self.thresh
        df = pd.DataFrame(X)
        df['y'] = y
        df.dropna(thresh=new_thresh, inplace=True)
        X = df.drop('y', axis=1).values
        y = df['y'].values
        return self

    def transform(self, X):
        return X

用于 sklearn 管道的自定义转换器，可同时更改 X 和 y [英] Custom transformer for sklearn Pipeline that alters both X and y

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录关闭

用于 sklearn 管道的自定义转换器，可同时更改 X 和 y [英] Custom transformer for sklearn Pipeline that alters both X and y

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录 关闭

登录关闭