scikit-learn:将任意函数用作流水线的一部分 [英] scikit-learn: applying an arbitary function as part of a pipeline
问题描述
我刚刚发现了管道 scikit-learn的功能,我发现它对于训练模型之前测试预处理步骤的不同组合非常有用.
I've just discovered the Pipeline feature of scikit-learn, and I find it very useful for testing different combinations of preprocessing steps before training my model.
管道是实现fit
和transform
方法的对象链.现在,如果我想添加一个新的预处理步骤,我曾经写过一个从sklearn.base.estimator
继承的类.但是,我认为必须有一个更简单的方法.我是否真的需要将要应用的每个函数包装到estimator类中?
A pipeline is a chain of objects that implement the fit
and transform
methods. Now, if I wanted to add a new preprocessing step, I used to write a class that inherits from sklearn.base.estimator
. However, I'm thinking that there must be a simpler method. Do I really need to wrap every function I want to apply in an estimator class?
示例:
class Categorizer(sklearn.base.BaseEstimator):
"""
Converts given columns into pandas dtype 'category'.
"""
def __init__(self, columns):
self.columns = columns
def fit(self, X, y):
return self
def transform(self, X):
for column in self.columns:
X[column] = X[column].astype("category")
return X
推荐答案
对于一般解决方案(适用于许多其他用例,不仅适用于转换器,还适用于简单模型等),您可以编写自己的 decorator (如果您具有无状态功能(这些功能无法实现拟合)),例如通过执行以下操作:
For a general solution (working for many other use cases, not just transformers, but also simple models etc.), you can write your own decorator if you have state-free functions (which do not implement fit), for example by doing:
class TransformerWrapper(sklearn.base.BaseEstimator):
def __init__(self, func):
self._func = func
def fit(self, *args, **kwargs):
return self
def transform(self, X, *args, **kwargs):
return self._func(X, *args, **kwargs)
现在您可以做
@TransformerWrapper
def foo(x):
return x*2
等效于
def foo(x):
return x*2
foo = TransformerWrapper(foo)
这是sklearn.preprocessing.FunctionTransformer在后台执行的操作.
which is what sklearn.preprocessing.FunctionTransformer is doing under the hood.
我个人觉得装饰更简单,因为您可以将预处理器与其余代码很好地分离开来,但这取决于您要遵循的路径.
Personally I find decorating simpler, since you have a nice separation of your preprocessors from the rest of the code, but it is up to you which path to follow.
实际上,您应该能够使用sklearn函数进行装饰
In fact you should be able to decorate with sklearn function by
from sklearn.preprocessing import FunctionTransformer
@FunctionTransformer
def foo(x):
return x*2
也是.
这篇关于scikit-learn:将任意函数用作流水线的一部分的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!