sklearn管道的自定义转换器，可同时更改X和y [英] Custom transformer for sklearn Pipeline that alters both X and y

查看：132 发布时间：2020/5/4 9:01:22 python numpy machine-learning scikit-learn data-analysis

本文介绍了sklearn管道的自定义转换器，可同时更改X和y的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想创建自己的变压器，以与sklearn Pipeline一起使用.因此，我正在创建一个同时实现fit和transform方法的类.转换器的目的是从矩阵中删除NaN数量超过指定数量的行.因此，我面临的问题是如何更改传递到转换器的X和y矩阵?我认为这必须在fit方法中完成，因为它可以同时访问X和y.由于一旦我将X重新分配给具有较少行的新矩阵，python就会通过赋值传递参数，因此丢失了对原始X的引用(当然，对于y也是如此).是否可以保留此参考?

I want to create my own transformer for use with the sklearn Pipeline. Hence I am creating a class that implements both fit and transform methods. The purpose of the transformer will be to remove rows from the matrix that have more than a specified number of NaNs. So the issue I am facing is how can I change both the X and y matrices that are passed to the transformer? I believe this has to be done in the fit method since it has access to both X and y. Since python passes arguments by assignment once I reassign X to a new matrix with fewer rows the reference to the original X is lost (and of course the same is true for y). Is it possible to maintain this reference?

我正在使用pandas DataFrame轻松删除NaN过多的行，对于我的用例来说，这可能不是正确的方法.当前代码如下:

I’m using a pandas DataFrame to easily drop the rows that have too many NaNs, this may not be the right way to do it for my use case. Current code looks like this:

class Dropna():

    # thresh is max number of NaNs allowed in a row
    def __init__(self, thresh=0):
        self.thresh = thresh

    def fit(self, X, y):
        total = X.shape[1]
        # +1 to account for 'y' being added to the dframe                                                                                                                            
        new_thresh = total + 1 - self.thresh
        df = pd.DataFrame(X)
        df['y'] = y
        df.dropna(thresh=new_thresh, inplace=True)
        X = df.drop('y', axis=1).values
        y = df['y'].values
        return self

    def transform(self, X):
        return X

sklearn管道的自定义转换器，可同时更改X和y [英] Custom transformer for sklearn Pipeline that alters both X and y

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录关闭

sklearn管道的自定义转换器，可同时更改X和y [英] Custom transformer for sklearn Pipeline that alters both X and y

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录 关闭

登录关闭