我可以使用XGBoost增强其他模型(例如,朴素贝叶斯,随机森林)吗? [英] Can I use XGBoost to boost other models (eg. Naive Bayes, Random Forest)?

查看:524
本文介绍了我可以使用XGBoost增强其他模型(例如,朴素贝叶斯,随机森林)吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在从事欺诈分析项目,因此需要一些帮助.以前,我使用SAS Enterprise Miner来了解有关增强/集成技术的更多信息,并且我了解到增强可以帮助改善模型的性能.

I am working on a fraud analytics project and I need some help with boosting. Previously, I used SAS Enterprise Miner to learn more about boosting/ensemble techniques and I learned that boosting can help to improve the performance of a model.

当前,我的小组已在Python上完成了以下模型:朴素贝叶斯,随机森林和神经网络我们想使用XGBoost来改善F1得分.我不确定这是否可行,因为我只遇到过有关如何单独执行XGBoost或Naive Bayes的教程.

Currently, my group have completed the following models on Python: Naive Bayes, Random Forest, and Neural Network We want to use XGBoost to make the F1-score better. I am not sure if this is possible since I only come across tutorials on how to do XGBoost or Naive Bayes on its own.

我正在寻找一个教程,他们将向您展示如何创建朴素贝叶斯模型,然后使用Boosting.在那之后,我们可以比较指标是否增加,以查看指标是否有所改善.我是机器学习的新手,所以我可能对此概念有误.

I am looking for a tutorial where they will show you how to create a Naive Bayes model and then use boosting. After that, we can compare the metrics with and without boosting to see if it improved. I am relatively new to machine learning so I could be wrong about this concept.

我考虑过替换XGBoost中的值,但不确定要更改哪个值,或者甚至不能以这种方式工作.

I thought of replacing the values in the XGBoost but not sure which one to change or if it can even work this way.

朴素贝叶斯

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X_sm,y_sm, test_size = 0.2, random_state=0)

from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import confusion_matrix, confusion_matrix, accuracy_score, f1_score, precision_score, recall_score

nb = GaussianNB()
nb.fit(X_train, y_train)
nb_pred = nb.predict(X_test)

XGBoost

from sklearn.model_selection import train_test_split
import xgboost as xgb
from xgboost import XGBClassifier
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X_sm,y_sm, test_size = 0.2, random_state=0)
model = XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=0.9, gamma=0,
learning_rate=0.1, max_delta_step=0, max_depth=10,
min_child_weight=1, missing=None, n_estimators=500, n_jobs=-1,
nthread=None, objective='binary:logistic', random_state=0,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None,
silent=None, subsample=0.9, verbosity=0)

model.fit(X_train, y_train)
y_pred = model.predict(X_test)
predictions = [round(value) for value in y_pred]

推荐答案

理论中,使用scikit-learn的

In theory, boosting any (base) classifier is easy and straightforward with scikit-learn's AdaBoostClassifier. E.g. for a Naive Bayes classifier, it should be:

from sklearn.ensemble import AdaBoostClassifier
from sklearn.naive_bayes import GaussianNB

nb = GaussianNB()
model = AdaBoostClassifier(base_estimator=nb, n_estimators=10)
model.fit(X_train, y_train)

以此类推.

实践中,我们从不使用朴素贝叶斯或神经网络作为增强的基本分类器(更不用说随机森林了,它们本身就是一种集成方法).

In practice, we never use Naive Bayes or Neural Nets as base classifiers for boosting (let alone Random Forests, which are themselves an ensemble method).

Adaboost(以及后来衍生出的类似增强方法,例如GBM和XGBoost)是使用决策树(DT)作为基础分类器(更具体地说,是决策 stumps ,即具有一定深度的DTs)构想的只有1);有一个很好的理由说明为什么今天仍然如此,如果您没有在上面的scikit-learn的AdaBoostClassifier中明确指定base_classifier自变量,则该参数将假定为DecisionTreeClassifier(max_depth=1)的值,即决策树桩.

Adaboost (and similar boosting methods that have been derived afterwards, like GBM and XGBoost) was conceived using decision trees (DTs) as base classifiers (more specifically, decision stumps, i.e. DTs with a depth of only 1); there is good reason why still today, if you don't specify explicitly the base_classifier argument in scikit-learn's AdaBoostClassifier above, it assumes a value of DecisionTreeClassifier(max_depth=1), i.e. a decision stump.

DT很适合此类集合,因为它们本质上是不稳定分类器,其他提到的算法则不是这种情况,因此,当用作增强的基本分类器时,期望后者不会提供任何东西算法.

DTs are suitable for such ensembling because they are essentially unstable classifiers, which is not the case with the other algorithms mentioned, hence the latter are not expected to offer anything when used as base classifiers for boosting algorithms.

这篇关于我可以使用XGBoost增强其他模型(例如,朴素贝叶斯,随机森林)吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆