在BaggingClassifier的参数内的参数上进行网格搜索 [英] Grid search on parameters inside the parameters of a BaggingClassifier

查看:658
本文介绍了在BaggingClassifier的参数内的参数上进行网格搜索的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是关于的后续问题在这里,但我认为它值得拥有自己的线程.

This is a follow up on a question answered here, but I believe it deserves its own thread.

在上一个问题中,我们正在处理一组Ensemble分类器,每个分类器都有自己的参数."让我们从 MaximeKan 在他的答案中提供的示例开始:

In the previous question, we were dealing with "an Ensemble of Ensemble classifiers, where each has its own parameters." Let's start with the example provided by MaximeKan in his answer:

my_est = BaggingClassifier(RandomForestClassifier(n_estimators = 100, bootstrap = True, 
      max_features = 0.5), n_estimators = 5, bootstrap_features = False, bootstrap = False, 
      max_features = 1.0, max_samples = 0.6 )

现在说我想再上一层:撇开效率,计算成本等因素,并考虑一个一般概念:我将如何使用这种设置进行网格搜索?

Now say I want to go one level above that: Considerations like efficiency, computational cost, etc., aside, and as a general concept: How would I ran grid search with this kind of setup?

我可以沿着这些行设置两个参数网格:

I can set up two parameter grids along these lines:

一个用于BaggingClassifier:

BC_param_grid = {
'bootstrap': [True, False],
'bootstrap_features': [True, False],    
'n_estimators': [5, 10, 15],
'max_samples' : [0.6, 0.8, 1.0]
}

还有一个用于RandomForestClassifier:

RFC_param_grid = {
'bootstrap': [True, False],    
'n_estimators': [100, 200, 300],
'max_features' : [0.6, 0.8, 1.0]
}

现在我可以使用估算器调用网格搜索:

Now I can call grid search with my estimator:

grid_search = GridSearchCV(estimator = my_est, param_grid = ???)

在这种情况下,如何使用param_grid参数?更具体地说,如何使用我设置的两个参数网格?

What do I do with the param_grid parameter in this case? Or more specifically, how do I use both of the parameter grids I set up?

我不得不说,感觉就像我在玩俄罗斯套娃./p>

I have to say, it feels like I’m playing with matryoshka dolls.

推荐答案

在上面的@James Dellinger评论之后,从那里扩展,我能够完成它.事实证明,秘密调味料"确实是一个鲜为人知的功能- __ (双下划线)分隔符(在

Following @James Dellinger comment above, and expanding from there, I was able to get it done. Turns out the "secret sauce" is indeed a mostly-undocumented feature - the __ (double underline) separator (there's some passing reference to it in the Pipeline documentation): it seems that adding the inside/base estimator name, followed by this __ to the name of an inside/base estimator parameter, allows you to create a param_grid which covers parameters for both the outside and inside estimators.

因此对于该示例,外部估计量为BaggingClassifier,内部/基本估计量为RandomForestClassifier.因此,您需要做的是首先导入需要导入的内容:

So for the example in the question, the outside estimator is BaggingClassifier and the inside/base estimator is RandomForestClassifier. So what you need to do is, first, to import what needs to be imported:

from sklearn.ensemble import BaggingClassifier, RandomForestClassifier
from sklearn.model_selection import GridSearchCV

之后是param_grid分配(在这种情况下,是问题中示例中的分配):

followed by the param_grid assignments (in this case, those in example in the question):

param_grid = {
 'bootstrap': [True, False],
 'bootstrap_features': [True, False],    
 'n_estimators': [5, 10, 15],
 'max_samples' : [0.6, 0.8, 1.0],
 'base_estimator__bootstrap': [True, False],    
 'base_estimator__n_estimators': [100, 200, 300],
 'base_estimator__max_features' : [0.6, 0.8, 1.0]
}

最后,您的网格搜索:

grid_search=GridSearchCV(BaggingClassifier(base_estimator=RandomForestClassifier()), param_grid=param_grid, cv=5)

您将参加比赛.

这篇关于在BaggingClassifier的参数内的参数上进行网格搜索的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆