如何使用另一个估算器来构成sklearn估算器? [英] How to compose sklearn estimators using another estimator?

查看:92
本文介绍了如何使用另一个估算器来构成sklearn估算器?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想训练 LogisticRegression RandomForestClassifier 并结合他们的得分使用 GaussianNB :

I want to train a LogisticRegression and a RandomForestClassifier and combine their scores using a GaussianNB:

from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.naive_bayes import GaussianNB

X, y = make_classification(n_samples=1000, n_features=4,
                           n_informative=2, n_redundant=0,
                           random_state=0, shuffle=False)

logit = LogisticRegression(random_state=0)
logit.fit(X, y)

randf = RandomForestClassifier(n_estimators=100, max_depth=2, random_state=0)
randf.fit(X, y)

X1 = np.transpose([logit.predict_proba(X)[:,0], randf.predict_proba(X)[:,0]])

nb = GaussianNB()
nb.fit(X1, y)

我该如何使用管道这样我就可以将其传递给 cross_validate GridSearchCV ?

How do I do this with Pipeline so that I can pass it to cross_validate and GridSearchCV?

PS.我想我可以定义自己的类来实现fitpredict_proba方法,但是我认为应该有一种标准的方法来实现它……

PS. I suppose I can define my own class implementing the fit and predict_proba methods, but I thought that there should be a standard way to do it...

推荐答案

不,sklearn内置任何功能,无需编写一些自定义代码即可完成您想要的操作.您可以使用 FeatureUnion ,然后使用 Pipeline ,但是您需要编写可将predict_proba的输出转发到transform方法的自定义转换器.

No, there is nothing inbuilt in sklearn to do what you want without writing some custom code. You can parallelize some parts of your code by using FeatureUnion, and sequence the whole task using Pipeline but you need to write custom transformers which can forward the output of predict_proba to transform method.

类似这样的东西:

from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.pipeline import Pipeline, FeatureUnion

X, y = make_classification(n_samples=1000, n_features=4,
                           n_informative=2, n_redundant=0,
                           random_state=0, shuffle=False)

# This is the custom transformer that will convert 
# predict_proba() to pipeline friendly transform()
class PredictProbaTransformer(BaseEstimator, TransformerMixin):
    def __init__(self, clf=None):
        self.clf = clf

    def fit(self, X, y):
        if self.clf is not None:
            self.clf.fit(X, y)

        return self

    def transform(self, X):

        if self.clf is not None:
            # Drop the 2nd column but keep 2d shape
            # because FeatureUnion wants that 
            return self.clf.predict_proba(X)[:,[0]]

        return X

    # This method is important for correct working of pipeline
    def fit_transform(self, X, y):
        return self.fit(X, y).transform(X)

logit = LogisticRegression(random_state=0)
randf = RandomForestClassifier(n_estimators=100, max_depth=2, random_state=0)

pipe = Pipeline([
                 ('stack',FeatureUnion([
                              ('logit', PredictProbaTransformer(logit)),
                              ('randf', PredictProbaTransformer(randf)),
                              #You can add more classifiers with custom wrapper like above
                                       ])),
                 ('nb',GaussianNB())])

pipe.fit(X, y)

现在您可以简单地调用pipe.predict(),所有事情都将正确完成.

Now you can simply call pipe.predict() and all the things will be correctly done.

有关FeatureUnion的更多信息,您可以在这里查看我对类似问题的其他答案:-

For more information about FeatureUnion, you can look at my other answer here to a similar question:-

登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆