以管道作为估计器的VotingClassifier [英] VotingClassifier with pipelines as estimators

查看:115
本文介绍了以管道作为估计器的VotingClassifier的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想从多个不同的模型(决策树,SVC和Keras网络)构建sklearn VotingClassifier集合.他们都需要不同类型的数据预处理,这就是为什么我为他们每个人制作了一个管道.

I want to build an sklearn VotingClassifier ensemble out of multiple different models (Decision Tree, SVC, and a Keras Network). All of them need a different kind of data preprocessing, which is why I made a pipeline for each of them.

# Define pipelines

# DTC pipeline
featuriser = Featuriser()
dtc = DecisionTreeClassifier()
dtc_pipe = Pipeline([('featuriser',featuriser),('dtc',dtc)])

# SVC pipeline
scaler = TimeSeriesScalerMeanVariance(kind='constant')
flattener = Flattener()
svc = SVC(C = 100, gamma = 0.001, kernel='rbf')
svc_pipe = Pipeline([('scaler', scaler),('flattener', flattener), ('svc', svc)])

# Keras pipeline
cnn = KerasClassifier(build_fn=get_model())
cnn_pipe = Pipeline([('scaler',scaler),('cnn',cnn)])

# Make an ensemble
ensemble = VotingClassifier(estimators=[('dtc', dtc_pipe), 
                                        ('svc', svc_pipe),
                                        ('cnn', cnn_pipe)], 
                            voting='hard')

FeaturiserTimeSeriesScalerMeanVarianceFlattener类是一些定制的转换器,均使用fittransformfit_transform方法.

The Featuriser,TimeSeriesScalerMeanVariance and Flattener classes are some custom made transformers that all employ fit,transform and fit_transform methods.

当我尝试ensemble.fit(X, y)使整个系统合身时,我收到错误消息:

When I try to ensemble.fit(X, y) fit the whole ensemble I get the error message:

ValueError:估计器列表应为分类器.

ValueError: The estimator list should be a classifier.

我可以理解,因为各个估计量不是专门的分类器,而是管道.有办法让它继续工作吗?

Which I can understand, as the individual estimators are not specifically classifiers but pipelines. Is there a way to still make it work?

推荐答案

问题出在KerasClassifier.它不提供在_validate_estimator中检查过的_estimator_type.

The problem is with the KerasClassifier. It does not provide the _estimator_type, which was checked in _validate_estimator.

这不是使用管道的问题.管道将此信息作为属性提供.请参见此处.

It is not the problem of using pipeline. Pipeline provides this information as a property. See here.

因此,快速解决方案是设置_estimator_type='classifier'.

Hence, the quick fix is setting _estimator_type='classifier'.

可复制的示例:

# Define pipelines
from sklearn.pipeline import Pipeline
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.preprocessing import MinMaxScaler, Normalizer
from sklearn.ensemble import VotingClassifier
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.datasets import make_classification
from keras.layers import Dense
from keras.models import Sequential

X, y = make_classification()

# DTC pipeline
featuriser = MinMaxScaler()
dtc = DecisionTreeClassifier()
dtc_pipe = Pipeline([('featuriser', featuriser), ('dtc', dtc)])

# SVC pipeline
scaler = Normalizer()
svc = SVC(C=100, gamma=0.001, kernel='rbf')
svc_pipe = Pipeline(
    [('scaler', scaler), ('svc', svc)])

# Keras pipeline
def get_model():
    # create model
    model = Sequential()
    model.add(Dense(10, input_dim=20, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    # Compile model
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model


cnn = KerasClassifier(build_fn=get_model)
cnn._estimator_type = "classifier"
cnn_pipe = Pipeline([('scaler', scaler), ('cnn', cnn)])


# Make an ensemble
ensemble = VotingClassifier(estimators=[('dtc', dtc_pipe), 
                                        ('svc', svc_pipe),
                                        ('cnn', cnn_pipe)], 
                            voting='hard')

ensemble.fit(X, y)

这篇关于以管道作为估计器的VotingClassifier的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆