Xgboost:bst.best_score、bst.best_iteration 和 bst.best_ntree_limit 有什么区别? [英] Xgboost: what is the difference among bst.best_score, bst.best_iteration and bst.best_ntree_limit?

查看:212
本文介绍了Xgboost:bst.best_score、bst.best_iteration 和 bst.best_ntree_limit 有什么区别?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我使用 xgboost 为 2-cates 分类问题训练我的数据时,我想使用早期停止来获得最佳模型,但我不知道该选择哪一个在我的预测中使用,因为提前停止将返回 3 个不同的选择.例如,我应该使用

When I use xgboost to train my data for a 2-cates classification problem,I'd like to use the early stopping to get the best model, but I'm confused about which one to use in my predict as the early stop will return 3 different choices. For example, should I use

preds = model.predict(xgtest, ntree_limit=bst.best_iteration)

或者我应该使用

preds = model.predict(xgtest, ntree_limit=bst.best_ntree_limit)

还是两者都对,它们应该适用于不同的情况?如果是这样,我如何判断该使用哪个?

or both right, and they should be applied to different circumstances? If so, how can I judge which one to use?

这里是xgboost文档的原文引用,但是没有给出原因,我也没有找到这些参数之间的比较:

Here is the original quotation of the xgboost document, but it didn't give the reason why and I also didn't find the comparison between those params:

提前停止

如果你有一个验证集,你可以使用提前停止来找到最佳提升轮数.提前停止至少需要一组在 evals 中.如果有多个,它将​​使用最后一个.

If you have a validation set, you can use early stopping to find the optimal number of boosting rounds. Early stopping requires at least one set in evals. If there's more than one, it will use the last.

train(..., evals=evals, early_stopping_rounds=10)

train(..., evals=evals, early_stopping_rounds=10)

模型将一直训练,直到验证分数停止提高.验证错误至少需要减少每early_stopping_rounds 以继续训练.

The model will train until the validation score stops improving. Validation error needs to decrease at least every early_stopping_rounds to continue training.

如果发生提前停止,模型将有三个额外的字段:bst.best_score、bst.best_iteration 和 bst.best_ntree_limit.注意train() 将返回上次迭代的模型,而不是最好的模型.压力文案

If early stopping occurs, the model will have three additional fields: bst.best_score, bst.best_iteration and bst.best_ntree_limit. Note that train() will return a model from the last iteration, not the best one. Pr ediction

经过训练或加载的模型可以对数据集.

A model that has been trained or loaded can perform predictions on data sets.

# 7 entities, each contains 10 features 
data = np.random.rand(7, 10) 
dtest = xgb.DMatrix(data) 
ypred = bst.predict(dtest)

如果提前停止在训练期间启用,您可以获得最好的预测使用 bst.best_ntree_limit 进行迭代:

If early stopping is enabled during training, you can get predictions from the best iteration with bst.best_ntree_limit:

ypred = bst.predict(dtest,ntree_limit=bst.best_ntree_limit)

提前致谢.

推荐答案

在我看来,这两个参数指的是同一个想法,或者至少有相同的目标.但我宁愿使用:

In my point of view, both parameters refer to the same think, or at least have the same goal. But I would rather use:

preds = model.predict(xgtest, ntree_limit=bst.best_iteration)

从源码中我们可以看到

From the source code, we can see here that best_ntree_limit is going to be dropped in favor of best_iteration.

def _get_booster_layer_trees(model: "Booster") -> Tuple[int, int]:
    """Get number of trees added to booster per-iteration.  This function will be removed
    once `best_ntree_limit` is dropped in favor of `best_iteration`.  Returns
    `num_parallel_tree` and `num_groups`.
    """

此外,best_ntree_limit 已从 EarlyStopping 文档页面.

Additionally, best_ntree_limit has been removed from EarlyStopping documentation page.

所以我认为这个属性存在只是为了向后兼容.因此,根据此代码片段和文档,我们可以假设 best_ntree_limit 已被弃用或将被弃用.

So I think this attribute exist only for backwards compatibility reasons. From this code snippet and the documentation, we can therefore assume that best_ntree_limit is or will be deprecated.

这篇关于Xgboost:bst.best_score、bst.best_iteration 和 bst.best_ntree_limit 有什么区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆