scikit 网格搜索多个分类器 [英] scikit grid search over multiple classifiers

查看:63
本文介绍了scikit 网格搜索多个分类器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道是否有更好的内置方式在单个管道中进行网格搜索和测试多个模型.当然,模型的参数会有所不同,这让我弄清楚这一点很复杂.这是我所做的:

I wanted to know if there is a better more inbuilt way to do grid search and test multiple models in a single pipeline. Of course the parameters of the models would be different, which made is complicated for me to figure this out. Here is what I did:

from sklearn.pipeline import Pipeline
from sklearn.ensemble import RandomForestClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.naive_bayes import MultinomialNB
from sklearn.grid_search import GridSearchCV


def grid_search():
    pipeline1 = Pipeline((
    ('clf', RandomForestClassifier()),
    ('vec2', TfidfTransformer())
    ))

    pipeline2 = Pipeline((
    ('clf', KNeighborsClassifier()),
    ))

    pipeline3 = Pipeline((
    ('clf', SVC()),
    ))

    pipeline4 = Pipeline((
    ('clf', MultinomialNB()),
    ))

    parameters1 = {
    'clf__n_estimators': [10, 20, 30],
    'clf__criterion': ['gini', 'entropy'],
    'clf__max_features': [5, 10, 15],
    'clf__max_depth': ['auto', 'log2', 'sqrt', None]
    }

    parameters2 = {
    'clf__n_neighbors': [3, 7, 10],
    'clf__weights': ['uniform', 'distance']
    }

    parameters3 = {
    'clf__C': [0.01, 0.1, 1.0],
    'clf__kernel': ['rbf', 'poly'],
    'clf__gamma': [0.01, 0.1, 1.0],

    }
    parameters4 = {
    'clf__alpha': [0.01, 0.1, 1.0]
    }

    pars = [parameters1, parameters2, parameters3, parameters4]
    pips = [pipeline1, pipeline2, pipeline3, pipeline4]

    print "starting Gridsearch"
    for i in range(len(pars)):
        gs = GridSearchCV(pips[i], pars[i], verbose=2, refit=False, n_jobs=-1)
        gs = gs.fit(X_train, y_train)
        print "finished Gridsearch"
        print gs.best_score_

然而,这种方法仍然在每个分类器中给出最佳模型,而不是在分类器之间进行比较.

However, this approach is still giving the best model within each classifier, and not comparing between classifiers.

推荐答案

代替使用网格搜索进行超参数选择,您可以使用 'hyperopt' 库.

Instead of using Grid Search for hyperparameter selection, you can use the 'hyperopt' library.

请查看此页面的第 2.2 节.在上述情况下,您可以使用 hp.choice 表达式在各种管道中进行选择,然后分别为每个管道定义参数表达式.

Please have a look at section 2.2 of this page. In the above case, you can use an hp.choice expression to select among the various pipelines and then define the parameter expressions for each one separately.

在您的目标函数中,您需要根据所选管道进行检查,并返回所选管道和参数的 CV 分数(可能通过 cross_cal_score).

In your objective function, you need to have a check depending on the pipeline chosen and return the CV score for the selected pipeline and parameters (possibly via cross_cal_score).

执行结束时的试验对象将指示总体最佳管道和参数.

The trials object at the end of the execution, will indicate the best pipeline and parameters overall.

这篇关于scikit 网格搜索多个分类器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆