使用随机森林作为 adaboost 的基本分类器 [英] using random forest as base classifier with adaboost

查看:114
本文介绍了使用随机森林作为 adaboost 的基本分类器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我可以将 AdaBoost 与随机森林一起用作基本分类器吗?我在互联网上搜索,我没有找到任何人这样做.

就像下面的代码;我尝试运行它,但需要很多时间:

estimators = Pipeline([('vectorizer', CountVectorizer()),('变压器', TfidfTransformer()),('分类器', AdaBoostClassifier(learning_rate=1))])RF=RandomForestClassifier(criterion='entropy',n_estimators=100,max_depth=500,min_samples_split=100,max_leaf_nodes=None,max_features='log2')参数网格={'vectorizer__ngram_range': [(1,2),(1,3)],'vectorizer__min_df':[5],'vectorizer__max_df':[0.7],'vectorizer__max_features':[1500],'transformer__use_idf':[真,假],'transformer__norm': ('l1','l2'),'transformer__smooth_idf':[真,假],'transformer__sublinear_tf':[真,假],'classifier__base_estimator':[RF],'分类器__算法':(SAMME.R",SAMME"),'classifier__n_estimators':[4,7,11,13,16,19,22,25,28,31,34,43,50]}

我尝试使用 GridSearchCV,将 RF 分类器添加到 AdaBoost 参数中.如果我使用它会提高准确性吗?

解决方案

难怪你实际上没有看到有人这样做 - 这是一个荒谬和糟糕的主意.

您正在尝试构建一个集成 (Adaboost),它本身由集成基分类器 (RF) 组成——本质上是一个集成平方";所以,难怪计算时间长.

但即使它是实用的,也有很好的理论上理由不这样做;引用我自己在 AdaBoost 与 SVM 的执行时间中的回答基本分类器:

<块引用>

Adaboost(和类似的集成方法)是使用决策树作为基本分类器(更具体地说,决策树桩,即深度仅为 1 的 DT)构思的;今天仍然有充分的理由,如果您没有明确指定 base_classifier 参数,它会假定 DecisionTreeClassifier(max_depth=1) 的值.DTs 适用于这种集成,因为它们本质上是不稳定分类器,而 SVM 并非如此,因此当用作基分类器时,预计后者不会提供太多.

最重要的是,SVM 在计算上比决策树(更不用说决策树桩)要昂贵得多,这就是您观察到的处理时间长的原因.

这个论点也适用于 RFs - 它们不是不稳定分类器,因此在将它们用作增强算法(如 Adaboost)的基本分类器时,没有任何理由实际期望性能改进.>

Can I use AdaBoost with random forest as a base classifier? I searched on the internet and I didn't find anyone who does it.

Like in the following code; I try to run it but it takes a lot of time:

estimators = Pipeline([('vectorizer', CountVectorizer()),
                       ('transformer', TfidfTransformer()),
                       ('classifier', AdaBoostClassifier(learning_rate=1))])

RF=RandomForestClassifier(criterion='entropy',n_estimators=100,max_depth=500,min_samples_split=100,max_leaf_nodes=None,
                          max_features='log2')


param_grid={
    'vectorizer__ngram_range': [(1,2),(1,3)],
    'vectorizer__min_df': [5],
    'vectorizer__max_df': [0.7],
    'vectorizer__max_features': [1500],

    'transformer__use_idf': [True , False],
    'transformer__norm': ('l1','l2'),
    'transformer__smooth_idf': [True , False],
     'transformer__sublinear_tf': [True , False],

    'classifier__base_estimator':[RF],
    'classifier__algorithm': ("SAMME.R","SAMME"),
    'classifier__n_estimators':[4,7,11,13,16,19,22,25,28,31,34,43,50]
}

I tried with the GridSearchCV, I added the RF classifier into the AdaBoost parameters. if I use it would the accuracy increase?

解决方案

No wonder you have not actually seen anyone doing it - it is an absurd and bad idea.

You are trying to build an ensemble (Adaboost) which in itself consists of ensemble base classifiers (RFs) - essentially an "ensemble-squared"; so, no wonder about the high computation time.

But even if it was practical, there are good theoretical reasons not to do it; quoting from my own answer in Execution time of AdaBoost with SVM base classifier:

Adaboost (and similar ensemble methods) were conceived using decision trees as base classifiers (more specifically, decision stumps, i.e. DTs with a depth of only 1); there is good reason why still today, if you don't specify explicitly the base_classifier argument, it assumes a value of DecisionTreeClassifier(max_depth=1). DTs are suitable for such ensembling because they are essentially unstable classifiers, which is not the case with SVMs, hence the latter are not expected to offer much when used as base classifiers.

On top of this, SVMs are computationally much more expensive than decision trees (let alone decision stumps), which is the reason for the long processing times you have observed.

The argument holds for RFs, too - they are not unstable classifiers, hence there is not any reason to actually expect performance improvements when using them as base classifiers for boosting algorithms, like Adaboost.

这篇关于使用随机森林作为 adaboost 的基本分类器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆