Pipeline中用于GridSearchCV的备用不同模型 [英] Alternate different models in Pipeline for GridSearchCV

查看:206
本文介绍了Pipeline中用于GridSearchCV的备用不同模型的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在sklearn中构建管道并使用GridSearchCV测试不同的模型.

I want to build a Pipeline in sklearn and test different models using GridSearchCV.

仅是一个示例(请不要注意选择了哪种特定模型):

Just an example (please do not pay attention on what particular models are chosen):

reg = LogisticRegression()

proj1 = PCA(n_components=2)
proj2 = MDS()
proj3 = TSNE()

pipe = [('proj', proj1), ('reg' , reg)]

pipe = Pipeline(pipe)

param_grid = {
    'reg__c': [0.01, 0.1, 1],
}

clf = GridSearchCV(pipe, param_grid = param_grid)

在这里,如果我想尝试不同的模型以减少维度,则需要编写不同的管道并手动比较它们.有简单的方法吗?

Here if I want to try different models for dimensionality reduction, I need to code different pipelines and compare them manually. Is there an easy way to do it?

我想出的一个解决方案是定义我自己的从基本估计量派生的类:

One solution I came up with is define my own class derived from base estimator:

class Projection(BaseEstimator):
    def __init__(self, est_name):
        if est_name == "MDS":
            self.model = MDS()
        ...
    ...
    def fit_transform(self, X):
        return self.model.fit_transform(X)

我认为它会起作用,我只是创建一个Projection对象并将其使用估算器的名称作为参数传递给Pipeline.

I think it will work, I just create a Projection object and pass it to Pipeline, using names of the estimators as parameters for it.

但是对我来说,这种方式有点混乱且不可扩展:这使我每次想比较不同的模型时都定义新的类.同样,为了继续该解决方案,可以实现一个类,该类执行相同的工作,但是具有任意一组模型.对我来说似乎太复杂了.

But to me this way is a bit chaotic and not scalable: it makes me to define new class each time I want to compare different models. Also to continue this solution, one could implement a class that does the same job, but with arbitrary set of models. It seems overcomplicated to me.

比较不同模型的最自然,最pythonic方式是什么?

What is the most natural and pythonic way to compare different models?

推荐答案

让我们假设您要使用PCA和TruncatedSVD作为减少维数的步骤.

Lets assume you want to use PCA and TruncatedSVD as your dimesionality reduction step.

pca = decomposition.PCA()
svd = decomposition.TruncatedSVD()
svm = SVC()
n_components = [20, 40, 64]

您可以执行以下操作:

pipe = Pipeline(steps=[('reduction', pca), ('svm', svm)])

# Change params_grid -> Instead of dict, make it a list of dict
# In the first element, pass parameters related to pca, and in second related to svd

params_grid = [{
'svm__C': [1, 10, 100, 1000],
'svm__kernel': ['linear', 'rbf'],
'svm__gamma': [0.001, 0.0001],
'reduction':pca,
'reduction__n_components': n_components,
},
{
'svm__C': [1, 10, 100, 1000],
'svm__kernel': ['linear', 'rbf'],
'svm__gamma': [0.001, 0.0001],
'reduction':svd,
'reduction__n_components': n_components,
'reduction__algorithm':['randomized']
}]

现在将管道对象传递给gridsearchCV

and now just pass the pipeline object to gridsearchCV

grd = GridSearchCV(pipe, param_grid = params_grid)

调用grd.fit()将一次使用one中的所有值在params_grid列表的两个元素上搜索参数.

Calling grd.fit() will search the parameters over both the elements of the params_grid list, using all values from one at a time.

请查看我的其他答案以获取更多详细信息:平行"管道以使用gridsearch获得最佳模型

Please look at my other answer for more details: "Parallel" pipeline to get best model using gridsearch

这篇关于Pipeline中用于GridSearchCV的备用不同模型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆