scikit-learn:将任意函数用作流水线的一部分 [英] scikit-learn: applying an arbitary function as part of a pipeline

查看:85
本文介绍了scikit-learn:将任意函数用作流水线的一部分的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我刚刚发现了管道 scikit-learn的功能,我发现它对于训练模型之前测试预处理步骤的不同组合非常有用.

I've just discovered the Pipeline feature of scikit-learn, and I find it very useful for testing different combinations of preprocessing steps before training my model.

管道是实现fittransform方法的对象链.现在,如果我想添加一个新的预处理步骤,我曾经写过一个从sklearn.base.estimator继承的类.但是,我认为必须有一个更简单的方法.我是否真的需要将要应用的每个函数包装到estimator类中?

A pipeline is a chain of objects that implement the fit and transform methods. Now, if I wanted to add a new preprocessing step, I used to write a class that inherits from sklearn.base.estimator. However, I'm thinking that there must be a simpler method. Do I really need to wrap every function I want to apply in an estimator class?

示例:

class Categorizer(sklearn.base.BaseEstimator):
    """
    Converts given columns into pandas dtype 'category'.
    """

    def __init__(self, columns):
        self.columns = columns

    def fit(self, X, y):
        return self


    def transform(self, X):
        for column in self.columns:
            X[column] = X[column].astype("category")
        return X

推荐答案

对于一般解决方案(适用于许多其他用例,不仅适用于转换器,还适用于简单模型等),您可以编写自己的 decorator (如果您具有无状态功能(这些功能无法实现拟合)),例如通过执行以下操作:

For a general solution (working for many other use cases, not just transformers, but also simple models etc.), you can write your own decorator if you have state-free functions (which do not implement fit), for example by doing:

class TransformerWrapper(sklearn.base.BaseEstimator):

    def __init__(self, func):
        self._func = func

    def fit(self, *args, **kwargs):
        return self

    def transform(self, X, *args, **kwargs):
        return self._func(X, *args, **kwargs)

现在您可以做

@TransformerWrapper
def foo(x):
  return x*2

等效于

def foo(x):
  return x*2

foo = TransformerWrapper(foo)

这是sklearn.preprocessing.FunctionTransformer在后台执行的操作.

which is what sklearn.preprocessing.FunctionTransformer is doing under the hood.

我个人觉得装饰更简单,因为您可以将预处理器与其余代码很好地分离开来,但这取决于您要遵循的路径.

Personally I find decorating simpler, since you have a nice separation of your preprocessors from the rest of the code, but it is up to you which path to follow.

实际上,您应该能够使用sklearn函数进行装饰

In fact you should be able to decorate with sklearn function by

from sklearn.preprocessing import FunctionTransformer

@FunctionTransformer
def foo(x):
  return x*2

也是.

这篇关于scikit-learn:将任意函数用作流水线的一部分的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆