如何使用另一个估算器来构成sklearn估算器? [英] How to compose sklearn estimators using another estimator?
问题描述
我想训练 LogisticRegression
和 RandomForestClassifier
并结合他们的得分使用 GaussianNB
:
I want to train a LogisticRegression
and a RandomForestClassifier
and combine their scores using a GaussianNB
:
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.naive_bayes import GaussianNB
X, y = make_classification(n_samples=1000, n_features=4,
n_informative=2, n_redundant=0,
random_state=0, shuffle=False)
logit = LogisticRegression(random_state=0)
logit.fit(X, y)
randf = RandomForestClassifier(n_estimators=100, max_depth=2, random_state=0)
randf.fit(X, y)
X1 = np.transpose([logit.predict_proba(X)[:,0], randf.predict_proba(X)[:,0]])
nb = GaussianNB()
nb.fit(X1, y)
我该如何使用管道这样我就可以将其传递给 cross_validate
和 GridSearchCV
?
How do I do this with Pipeline so that I can pass it to cross_validate
and GridSearchCV
?
PS.我想我可以定义自己的类来实现fit
和predict_proba
方法,但是我认为应该有一种标准的方法来实现它……
PS. I suppose I can define my own class implementing the fit
and predict_proba
methods, but I thought that there should be a standard way to do it...
推荐答案
不,sklearn内置任何功能,无需编写一些自定义代码即可完成您想要的操作.您可以使用 FeatureUnion
,然后使用 Pipeline
,但是您需要编写可将predict_proba
的输出转发到transform
方法的自定义转换器.
No, there is nothing inbuilt in sklearn to do what you want without writing some custom code. You can parallelize some parts of your code by using FeatureUnion
, and sequence the whole task using Pipeline
but you need to write custom transformers which can forward the output of predict_proba
to transform
method.
类似这样的东西:
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.pipeline import Pipeline, FeatureUnion
X, y = make_classification(n_samples=1000, n_features=4,
n_informative=2, n_redundant=0,
random_state=0, shuffle=False)
# This is the custom transformer that will convert
# predict_proba() to pipeline friendly transform()
class PredictProbaTransformer(BaseEstimator, TransformerMixin):
def __init__(self, clf=None):
self.clf = clf
def fit(self, X, y):
if self.clf is not None:
self.clf.fit(X, y)
return self
def transform(self, X):
if self.clf is not None:
# Drop the 2nd column but keep 2d shape
# because FeatureUnion wants that
return self.clf.predict_proba(X)[:,[0]]
return X
# This method is important for correct working of pipeline
def fit_transform(self, X, y):
return self.fit(X, y).transform(X)
logit = LogisticRegression(random_state=0)
randf = RandomForestClassifier(n_estimators=100, max_depth=2, random_state=0)
pipe = Pipeline([
('stack',FeatureUnion([
('logit', PredictProbaTransformer(logit)),
('randf', PredictProbaTransformer(randf)),
#You can add more classifiers with custom wrapper like above
])),
('nb',GaussianNB())])
pipe.fit(X, y)
现在您可以简单地调用pipe.predict()
,所有事情都将正确完成.
Now you can simply call pipe.predict()
and all the things will be correctly done.
有关FeatureUnion的更多信息,您可以在这里查看我对类似问题的其他答案:-
For more information about FeatureUnion, you can look at my other answer here to a similar question:-