Pipeline中用于GridSearchCV的备用不同模型 [英] Alternate different models in Pipeline for GridSearchCV
问题描述
我想在sklearn中构建管道并使用GridSearchCV测试不同的模型.
I want to build a Pipeline in sklearn and test different models using GridSearchCV.
仅是一个示例(请不要注意选择了哪种特定模型):
Just an example (please do not pay attention on what particular models are chosen):
reg = LogisticRegression()
proj1 = PCA(n_components=2)
proj2 = MDS()
proj3 = TSNE()
pipe = [('proj', proj1), ('reg' , reg)]
pipe = Pipeline(pipe)
param_grid = {
'reg__c': [0.01, 0.1, 1],
}
clf = GridSearchCV(pipe, param_grid = param_grid)
在这里,如果我想尝试不同的模型以减少维度,则需要编写不同的管道并手动比较它们.有简单的方法吗?
Here if I want to try different models for dimensionality reduction, I need to code different pipelines and compare them manually. Is there an easy way to do it?
我想出的一个解决方案是定义我自己的从基本估计量派生的类:
One solution I came up with is define my own class derived from base estimator:
class Projection(BaseEstimator):
def __init__(self, est_name):
if est_name == "MDS":
self.model = MDS()
...
...
def fit_transform(self, X):
return self.model.fit_transform(X)
我认为它会起作用,我只是创建一个Projection对象并将其使用估算器的名称作为参数传递给Pipeline.
I think it will work, I just create a Projection object and pass it to Pipeline, using names of the estimators as parameters for it.
但是对我来说,这种方式有点混乱且不可扩展:这使我每次想比较不同的模型时都定义新的类.同样,为了继续该解决方案,可以实现一个类,该类执行相同的工作,但是具有任意一组模型.对我来说似乎太复杂了.
But to me this way is a bit chaotic and not scalable: it makes me to define new class each time I want to compare different models. Also to continue this solution, one could implement a class that does the same job, but with arbitrary set of models. It seems overcomplicated to me.
比较不同模型的最自然,最pythonic方式是什么?
What is the most natural and pythonic way to compare different models?
推荐答案
让我们假设您要使用PCA和TruncatedSVD作为减少维数的步骤.
Lets assume you want to use PCA and TruncatedSVD as your dimesionality reduction step.
pca = decomposition.PCA()
svd = decomposition.TruncatedSVD()
svm = SVC()
n_components = [20, 40, 64]
您可以执行以下操作:
pipe = Pipeline(steps=[('reduction', pca), ('svm', svm)])
# Change params_grid -> Instead of dict, make it a list of dict
# In the first element, pass parameters related to pca, and in second related to svd
params_grid = [{
'svm__C': [1, 10, 100, 1000],
'svm__kernel': ['linear', 'rbf'],
'svm__gamma': [0.001, 0.0001],
'reduction':pca,
'reduction__n_components': n_components,
},
{
'svm__C': [1, 10, 100, 1000],
'svm__kernel': ['linear', 'rbf'],
'svm__gamma': [0.001, 0.0001],
'reduction':svd,
'reduction__n_components': n_components,
'reduction__algorithm':['randomized']
}]
现在将管道对象传递给gridsearchCV
and now just pass the pipeline object to gridsearchCV
grd = GridSearchCV(pipe, param_grid = params_grid)
调用grd.fit()
将一次使用one
中的所有值在params_grid列表的两个元素上搜索参数.
Calling grd.fit()
will search the parameters over both the elements of the params_grid list, using all values from one
at a time.
请查看我的其他答案以获取更多详细信息:平行"管道以使用gridsearch获得最佳模型
Please look at my other answer for more details: "Parallel" pipeline to get best model using gridsearch
这篇关于Pipeline中用于GridSearchCV的备用不同模型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!