Python 中的 XGBoost XGBClassifier 默认值 [英] XGBoost XGBClassifier Defaults in Python
问题描述
我正在尝试使用 XGBoosts 分类器对一些二进制数据进行分类.当我做最简单的事情时,只使用默认值(如下)
clf = xgb.XGBClassifier()metLearn=CalibratedClassifierCV(clf, method='isotonic', cv=2)metLearn.fit(火车,trainTarget)testPredictions = metLearn.predict(test)
我得到了相当不错的分类结果.
我的下一步是尝试调整我的参数.从参数指南中猜测...https://github.com/dmlc/xgboost/blob/master/doc/parameter.md我想从默认开始并从那里开始工作...
# xgboost 的设置参数参数 = {}参数['助推器'] = 'gbtree'参数['目标'] = '二进制:逻辑'参数["eval_metric"] = "错误"参数['eta'] = 0.3参数['伽玛'] = 0参数['max_depth'] = 6参数['min_child_weight']=1参数['max_delta_step'] = 0参数['子样本']= 1参数['colsample_bytree']=1参数['沉默'] = 1参数['种子'] = 0参数['base_score'] = 0.5clf = xgb.XGBClassifier(params)metLearn=CalibratedClassifierCV(clf, method='isotonic', cv=2)metLearn.fit(火车,trainTarget)testPredictions = metLearn.predict(test)
结果是一切都被预测为条件之一,而不是另一个.
奇怪的是如果我设置了
params={}
我希望给我与不提供任何参数相同的默认值,我得到同样的事情发生
那么有谁知道 XGBclassifier 的默认值是什么?以便我可以开始调整?
这不是您在 xgboost 中设置参数的方式.您要么希望将参数网格传递到训练函数中,例如 xgboost 的 train
或 sklearn 的 GridSearchCV
,要么希望使用 XGBClassifier 的 set_params代码>方法.另一件要注意的事情是,如果您使用 xgboost 的包装器来 sklearn(即:
XGBClassifier()
或 XGBRegressor()
类),那么使用的参数名称是相同的在 sklearn 自己的 GBM 类中使用的那些(例如:eta --> learning_rate).我没有看到 sklearn 包装器的确切文档隐藏在哪里,但这些类的代码在这里:https://github.com/dmlc/xgboost/blob/master/python-package/xgboost/sklearn.py
这里是您如何直接设置模型对象参数的参考.
<预><代码>>>>网格 = {'max_depth':10}>>>>>>clf = XGBClassifier()>>>clf.max_depth3>>>clf.set_params(**grid)XGBClassifier(base_score=0.5, colsample_bylevel=1, colsample_bytree=1,伽马=0,learning_rate=0.1,max_delta_step=0,max_depth=10,min_child_weight=1,missing=None,n_estimators=100,nthread=-1,目标='二进制:逻辑',reg_alpha=0,reg_lambda=1,scale_pos_weight=1,seed=0,silent=True,subsample=1)>>>clf.max_depth10我想您可以在模型创建时设置参数,但这样做并不是很典型,因为大多数人以某种方式进行网格搜索.但是,如果您这样做,则需要将它们列为完整参数或使用 **kwargs.例如:
<预><代码>>>>XGBClassifier(max_depth=10)XGBClassifier(base_score=0.5, colsample_bylevel=1, colsample_bytree=1,gamma=0,learning_rate=0.1,max_delta_step=0,max_depth=10,min_child_weight=1,missing=None,n_estimators=100,nthread=-1,目标='二进制:逻辑',reg_alpha=0,reg_lambda=1,scale_pos_weight=1,seed=0,silent=True,subsample=1)>>>XGB分类器(**网格)XGBClassifier(base_score=0.5, colsample_bylevel=1, colsample_bytree=1,伽马=0,learning_rate=0.1,max_delta_step=0,max_depth=10,min_child_weight=1,missing=None,n_estimators=100,nthread=-1,目标='二进制:逻辑',reg_alpha=0,reg_lambda=1,scale_pos_weight=1,seed=0,silent=True,subsample=1)使用没有 **kwargs 的字典作为输入会将该参数设置为字面上的字典:
<预><代码>>>>XGB分类器(网格)XGBClassifier(base_score=0.5, colsample_bylevel=1, colsample_bytree=1,伽马=0,learning_rate=0.1,max_delta_step=0,max_depth={'max_depth': 10}, min_child_weight=1, missing=None,n_estimators=100,nthread=-1,objective='binary:logistic',reg_alpha=0,reg_lambda=1,scale_pos_weight=1,seed=0,silent=True,子样本=1)I am attempting to use XGBoosts classifier to classify some binary data. When I do the simplest thing and just use the defaults (as follows)
clf = xgb.XGBClassifier()
metLearn=CalibratedClassifierCV(clf, method='isotonic', cv=2)
metLearn.fit(train, trainTarget)
testPredictions = metLearn.predict(test)
I get reasonably good classification results.
My next step was to try tuning my parameters. Guessing from the parameters guide at... https://github.com/dmlc/xgboost/blob/master/doc/parameter.md I wanted to start from the default and work from there...
# setup parameters for xgboost
param = {}
param['booster'] = 'gbtree'
param['objective'] = 'binary:logistic'
param["eval_metric"] = "error"
param['eta'] = 0.3
param['gamma'] = 0
param['max_depth'] = 6
param['min_child_weight']=1
param['max_delta_step'] = 0
param['subsample']= 1
param['colsample_bytree']=1
param['silent'] = 1
param['seed'] = 0
param['base_score'] = 0.5
clf = xgb.XGBClassifier(params)
metLearn=CalibratedClassifierCV(clf, method='isotonic', cv=2)
metLearn.fit(train, trainTarget)
testPredictions = metLearn.predict(test)
The result is everything being predicted to be one of the conditions and not the other.
curiously if I set
params={}
which I expected to give me the same defaults as not feeding any parameters, I get the same thing happening
So does anyone know what the defaults for XGBclassifier is? so that I can start tuning?
That isn't how you set parameters in xgboost. You would either want to pass your param grid into your training function, such as xgboost's train
or sklearn's GridSearchCV
, or you would want to use your XGBClassifier's set_params
method. Another thing to note is that if you're using xgboost's wrapper to sklearn (ie: the XGBClassifier()
or XGBRegressor()
classes) then the paramater names used are the same ones used in sklearn's own GBM class (ex: eta --> learning_rate). I'm not seeing where the exact documentation for the sklearn wrapper is hidden, but the code for those classes is here: https://github.com/dmlc/xgboost/blob/master/python-package/xgboost/sklearn.py
For your reference here is how you would set the model object parameters directly.
>>> grid = {'max_depth':10}
>>>
>>> clf = XGBClassifier()
>>> clf.max_depth
3
>>> clf.set_params(**grid)
XGBClassifier(base_score=0.5, colsample_bylevel=1, colsample_bytree=1,
gamma=0, learning_rate=0.1, max_delta_step=0, max_depth=10,
min_child_weight=1, missing=None, n_estimators=100, nthread=-1,
objective='binary:logistic', reg_alpha=0, reg_lambda=1,
scale_pos_weight=1, seed=0, silent=True, subsample=1)
>>> clf.max_depth
10
EDIT: I suppose you can set parameters on model creation, it just isn't super typical to do so since most people grid search in some means. However if you do so you would need to either list them as full params or use **kwargs. For example:
>>> XGBClassifier(max_depth=10)
XGBClassifier(base_score=0.5, colsample_bylevel=1, colsample_bytree=1,
gamma=0, learning_rate=0.1, max_delta_step=0, max_depth=10,
min_child_weight=1, missing=None, n_estimators=100, nthread=-1,
objective='binary:logistic', reg_alpha=0, reg_lambda=1,
scale_pos_weight=1, seed=0, silent=True, subsample=1)
>>> XGBClassifier(**grid)
XGBClassifier(base_score=0.5, colsample_bylevel=1, colsample_bytree=1,
gamma=0, learning_rate=0.1, max_delta_step=0, max_depth=10,
min_child_weight=1, missing=None, n_estimators=100, nthread=-1,
objective='binary:logistic', reg_alpha=0, reg_lambda=1,
scale_pos_weight=1, seed=0, silent=True, subsample=1)
Using a dictionary as input without **kwargs will set that parameter to literally be your dictionary:
>>> XGBClassifier(grid)
XGBClassifier(base_score=0.5, colsample_bylevel=1, colsample_bytree=1,
gamma=0, learning_rate=0.1, max_delta_step=0,
max_depth={'max_depth': 10}, min_child_weight=1, missing=None,
n_estimators=100, nthread=-1, objective='binary:logistic',
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=0, silent=True,
subsample=1)
这篇关于Python 中的 XGBoost XGBClassifier 默认值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!