Python 中的 XGBoost XGBClassifier 默认值 [英] XGBoost XGBClassifier Defaults in Python

查看:107
本文介绍了Python 中的 XGBoost XGBClassifier 默认值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 XGBoosts 分类器对一些二进制数据进行分类.当我做最简单的事情时,只使用默认值(如下)

clf = xgb.XGBClassifier()metLearn=CalibratedClassifierCV(clf, method='isotonic', cv=2)metLearn.fit(火车,trainTarget)testPredictions = metLearn.predict(test)

我得到了相当不错的分类结果.

我的下一步是尝试调整我的参数.从参数指南中猜测...https://github.com/dmlc/xgboost/blob/master/doc/parameter.md我想从默认开始并从那里开始工作...

# xgboost 的设置参数参数 = {}参数['助推器'] = 'gbtree'参数['目标'] = '二进制:逻辑'参数["eval_metric"] = "错误"参数['eta'] = 0.3参数['伽玛'] = 0参数['max_depth'] = 6参数['min_child_weight']=1参数['max_delta_step'] = 0参数['子样本']= 1参数['colsample_bytree']=1参数['沉默'] = 1参数['种子'] = 0参数['base_score'] = 0.5clf = xgb.XGBClassifier(params)metLearn=CalibratedClassifierCV(clf, method='isotonic', cv=2)metLearn.fit(火车,trainTarget)testPredictions = metLearn.predict(test)

结果是一切都被预测为条件之一,而不是另一个.

奇怪的是如果我设置了

params={}

我希望给我与不提供任何参数相同的默认值,我得到同样的事情发生

那么有谁知道 XGBclassifier 的默认值是什么?以便我可以开始调整?

解决方案

这不是您在 xgboost 中设置参数的方式.您要么希望将参数网格传递到训练函数中,例如 xgboost 的 train 或 sklearn 的 GridSearchCV,要么希望使用 XGBClassifier 的 set_params方法.另一件要注意的事情是,如果您使用 xgboost 的包装器来 sklearn(即:XGBClassifier()XGBRegressor() 类),那么使用的参数名称是相同的在 sklearn 自己的 GBM 类中使用的那些(例如:eta --> learning_rate).我没有看到 sklearn 包装器的确切文档隐藏在哪里,但这些类的代码在这里:https://github.com/dmlc/xgboost/blob/master/python-package/xgboost/sklearn.py

这里是您如何直接设置模型对象参数的参考.

<预><代码>>>>网格 = {'max_depth':10}>>>>>>clf = XGBClassifier()>>>clf.max_depth3>>>clf.set_params(**grid)XGBClassifier(base_score=0.5, colsample_bylevel=1, colsample_bytree=1,伽马=0,learning_rate=0.1,max_delta_step=0,max_depth=10,min_child_weight=1,missing=None,n_estimators=100,nthread=-1,目标='二进制:逻辑',reg_alpha=0,reg_lambda=1,scale_pos_weight=1,seed=0,silent=True,subsample=1)>>>clf.max_depth10

我想您可以在模型创建时设置参数,但这样做并不是很典型,因为大多数人以某种方式进行网格搜索.但是,如果您这样做,则需要将它们列为完整参数或使用 **kwargs.例如:

<预><代码>>>>XGBClassifier(max_depth=10)XGBClassifier(base_score=0.5, colsample_bylevel=1, colsample_bytree=1,gamma=0,learning_rate=0.1,max_delta_step=0,max_depth=10,min_child_weight=1,missing=None,n_estimators=100,nthread=-1,目标='二进制:逻辑',reg_alpha=0,reg_lambda=1,scale_pos_weight=1,seed=0,silent=True,subsample=1)>>>XGB分类器(**网格)XGBClassifier(base_score=0.5, colsample_bylevel=1, colsample_bytree=1,伽马=0,learning_rate=0.1,max_delta_step=0,max_depth=10,min_child_weight=1,missing=None,n_estimators=100,nthread=-1,目标='二进制:逻辑',reg_alpha=0,reg_lambda=1,scale_pos_weight=1,seed=0,silent=True,subsample=1)

使用没有 **kwargs 的字典作为输入会将该参数设置为字面上的字典:

<预><代码>>>>XGB分类器(网格)XGBClassifier(base_score=0.5, colsample_bylevel=1, colsample_bytree=1,伽马=0,learning_rate=0.1,max_delta_step=0,max_depth={'max_depth': 10}, min_child_weight=1, missing=None,n_estimators=100,nthread=-1,objective='binary:logistic',reg_alpha=0,reg_lambda=1,scale_pos_weight=1,seed=0,silent=True,子样本=1)

I am attempting to use XGBoosts classifier to classify some binary data. When I do the simplest thing and just use the defaults (as follows)

clf = xgb.XGBClassifier()
metLearn=CalibratedClassifierCV(clf, method='isotonic', cv=2)
metLearn.fit(train, trainTarget)
testPredictions = metLearn.predict(test)

I get reasonably good classification results.

My next step was to try tuning my parameters. Guessing from the parameters guide at... https://github.com/dmlc/xgboost/blob/master/doc/parameter.md I wanted to start from the default and work from there...

# setup parameters for xgboost
param = {}
param['booster'] = 'gbtree'
param['objective'] = 'binary:logistic'
param["eval_metric"] = "error"
param['eta'] = 0.3
param['gamma'] = 0
param['max_depth'] = 6
param['min_child_weight']=1
param['max_delta_step'] = 0
param['subsample']= 1
param['colsample_bytree']=1
param['silent'] = 1
param['seed'] = 0
param['base_score'] = 0.5

clf = xgb.XGBClassifier(params)
metLearn=CalibratedClassifierCV(clf, method='isotonic', cv=2)
metLearn.fit(train, trainTarget)
testPredictions = metLearn.predict(test)

The result is everything being predicted to be one of the conditions and not the other.

curiously if I set

params={}

which I expected to give me the same defaults as not feeding any parameters, I get the same thing happening

So does anyone know what the defaults for XGBclassifier is? so that I can start tuning?

解决方案

That isn't how you set parameters in xgboost. You would either want to pass your param grid into your training function, such as xgboost's train or sklearn's GridSearchCV, or you would want to use your XGBClassifier's set_params method. Another thing to note is that if you're using xgboost's wrapper to sklearn (ie: the XGBClassifier() or XGBRegressor() classes) then the paramater names used are the same ones used in sklearn's own GBM class (ex: eta --> learning_rate). I'm not seeing where the exact documentation for the sklearn wrapper is hidden, but the code for those classes is here: https://github.com/dmlc/xgboost/blob/master/python-package/xgboost/sklearn.py

For your reference here is how you would set the model object parameters directly.

>>> grid = {'max_depth':10}
>>> 
>>> clf = XGBClassifier()
>>> clf.max_depth
3
>>> clf.set_params(**grid)
XGBClassifier(base_score=0.5, colsample_bylevel=1, colsample_bytree=1,
       gamma=0, learning_rate=0.1, max_delta_step=0, max_depth=10,
       min_child_weight=1, missing=None, n_estimators=100, nthread=-1,
       objective='binary:logistic', reg_alpha=0, reg_lambda=1,
       scale_pos_weight=1, seed=0, silent=True, subsample=1)
>>> clf.max_depth
10

EDIT: I suppose you can set parameters on model creation, it just isn't super typical to do so since most people grid search in some means. However if you do so you would need to either list them as full params or use **kwargs. For example:

>>> XGBClassifier(max_depth=10)
XGBClassifier(base_score=0.5, colsample_bylevel=1, colsample_bytree=1,
       gamma=0, learning_rate=0.1, max_delta_step=0, max_depth=10,
       min_child_weight=1, missing=None, n_estimators=100, nthread=-1,
       objective='binary:logistic', reg_alpha=0, reg_lambda=1,
       scale_pos_weight=1, seed=0, silent=True, subsample=1)
>>> XGBClassifier(**grid)
XGBClassifier(base_score=0.5, colsample_bylevel=1, colsample_bytree=1,
       gamma=0, learning_rate=0.1, max_delta_step=0, max_depth=10,
       min_child_weight=1, missing=None, n_estimators=100, nthread=-1,
       objective='binary:logistic', reg_alpha=0, reg_lambda=1,
       scale_pos_weight=1, seed=0, silent=True, subsample=1)

Using a dictionary as input without **kwargs will set that parameter to literally be your dictionary:

>>> XGBClassifier(grid)
XGBClassifier(base_score=0.5, colsample_bylevel=1, colsample_bytree=1,
       gamma=0, learning_rate=0.1, max_delta_step=0,
       max_depth={'max_depth': 10}, min_child_weight=1, missing=None,
       n_estimators=100, nthread=-1, objective='binary:logistic',
       reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=0, silent=True,
       subsample=1)

这篇关于Python 中的 XGBoost XGBClassifier 默认值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆