将GridSearchCV与AdaBoost和DecisionTreeClassifier一起使用 [英] Using GridSearchCV with AdaBoost and DecisionTreeClassifier

查看:522
本文介绍了将GridSearchCV与AdaBoost和DecisionTreeClassifier一起使用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用DecisionTreeClassifier( DTC)作为base_estimator来调整AdaBoost分类器( ABT)。我想同时调整两者和ABT参数,但不确定如何实现-管道不起作用,因为我没有将DTC的输出管道到ABT。这个想法是在GridSearchCV估计器中迭代ABT和DTC的超参数。

I am attempting to tune an AdaBoost Classifier ("ABT") using a DecisionTreeClassifier ("DTC") as the base_estimator. I would like to tune both ABT and DTC parameters simultaneously, but am not sure how to accomplish this - pipeline shouldn't work, as I am not "piping" the output of DTC to ABT. The idea would be to iterate hyper parameters for ABT and DTC in the GridSearchCV estimator.

如何正确指定调整参数?

How can I specify the tuning parameters correctly?

我尝试了以下操作,这在下面产生了错误。

I tried the following, which generated an error below.

[IN]
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import AdaBoostClassifier
from sklearn.grid_search import GridSearchCV

param_grid = {dtc__criterion : ["gini", "entropy"],
              dtc__splitter :   ["best", "random"],
              abc__n_estimators: [none, 1, 2]
             }


DTC = DecisionTreeClassifier(random_state = 11, max_features = "auto", class_weight = "auto",max_depth = None)

ABC = AdaBoostClassifier(base_estimator = DTC)

# run grid search
grid_search_ABC = GridSearchCV(ABC, param_grid=param_grid, scoring = 'roc_auc')

[OUT]
ValueError: Invalid parameter dtc for estimator AdaBoostClassifier(algorithm='SAMME.R',
      base_estimator=DecisionTreeClassifier(class_weight='auto', criterion='gini', max_depth=None,
        max_features='auto', max_leaf_nodes=None, min_samples_leaf=1,
        min_samples_split=2, min_weight_fraction_leaf=0.0,
        random_state=11, splitter='best'),
      learning_rate=1.0, n_estimators=50, random_state=11)


推荐答案

发布的代码中有几处错误:

There are several things wrong in the code you posted:


  1. param_grid 字典的键必须是字符串。您应该得到一个 NameError

  2. 关键字 abc__n_estimators应该只是 n_estimators:您可能会将其与管道语法。这里没有任何内容告诉Python,字符串 abc代表您的 AdaBoostClassifier

  3. None n_estimators ,c>(而不是 none )不是有效值。默认值(可能是您的意思)是50。

  1. The keys of the param_grid dictionary need to be strings. You should be getting a NameError.
  2. The key "abc__n_estimators" should just be "n_estimators": you are probably mixing this with the pipeline syntax. Here nothing tells Python that the string "abc" represents your AdaBoostClassifier.
  3. None (and not none) is not a valid value for n_estimators. The default value (probably what you meant) is 50.

以下是带有这些修复程序的代码。
要设置Tree估算器的参数,可以使用允许访问嵌套参数的 __语法。

Here's the code with these fixes. To set the parameters of your Tree estimator you can use the "__" syntax that allows accessing nested parameters.

from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import AdaBoostClassifier
from sklearn.grid_search import GridSearchCV

param_grid = {"base_estimator__criterion" : ["gini", "entropy"],
              "base_estimator__splitter" :   ["best", "random"],
              "n_estimators": [1, 2]
             }


DTC = DecisionTreeClassifier(random_state = 11, max_features = "auto", class_weight = "auto",max_depth = None)

ABC = AdaBoostClassifier(base_estimator = DTC)

# run grid search
grid_search_ABC = GridSearchCV(ABC, param_grid=param_grid, scoring = 'roc_auc')

另外,对于AdaBoost而言,1或2个估算器实际上没有任何意义。但是我猜这不是您正在运行的实际代码。

Also, 1 or 2 estimators does not really make sense for AdaBoost. But I'm guessing this is not the actual code you're running.

希望这会有所帮助。

这篇关于将GridSearchCV与AdaBoost和DecisionTreeClassifier一起使用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆