xgboost文档错误吗? (尽早停止回合以及最佳和最后一次迭代) [英] Is the xgboost documentation wrong ? (early stopping rounds and best and last iteration)

查看：305 发布时间：2020/5/4 9:59:11 python machine-learning scikit-learn xgboost

本文介绍了xgboost文档错误吗? (尽早停止回合以及最佳和最后一次迭代)的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

下面是一个关于xgboost早期停止回合参数的问题，当它是拟合结束的原因时，它如何进行或不进行最佳迭代.

here below is a question about xgboost early stopping rounds parameter and how it does, or does not, give the best iteration when it is the reason why the fit ends.

在xgboost文档中，您可以在scikit学习api部分中看到(链接)，由于提前停止回合参数而使拟合停止时:

In xgboost documentation, one can see in the scikit learn api section (link) that when the fit stops due to the early stopping rounds parameter:

激活提前停止.验证错误至少需要减少每个"early_stopping_rounds"回合才能继续训练.至少需要评估中的一项.如果有多个，则使用最后一个.返回上一次迭代的模型(不是最佳迭代).

Activates early stopping. Validation error needs to decrease at least every "early_stopping_rounds" round(s) to continue training. Requires at least one item in evals. If there’s more than one, will use the last. Returns the model from the last iteration (not the best one).

在进行此操作时，返回的模型似乎不是最好的模型，而是最后一个模型.它表示，要在预测时获得最佳预测，可以使用ntree_limit参数并在拟合结束时给出bst.best_ntree_limit来调用预测.

When reeding this, it seems that the model returned, in this case, is not the best one but the last one. To access the best one when predict, it says, it is possible to call the predict using the ntree_limit parameter with the bst.best_ntree_limit given at the end of the fit.

从这个意义上讲，它应该与xgboost火车的工作方式相同，因为scikitlearn api的合适性似乎只是火车和其他火车的嵌入.

In this sense, it should work the same way as the train of xgboost since the fit of the scikitlearn api seems to be only an embedding of the train and others.

此处堆栈溢出讨论或此处但是，当我尝试解决此问题并检查它如何与我的数据一起使用时，我没有找到我认为应该具有的行为.实际上，我所遇到的行为根本不是那些讨论和文档中描述的行为.

But when I tried to address this problem and check how it worked with my data, I did not find the behavior that I thought I should have. In fact the behavior I encountered was not at all the one discribed in those discussions and documentation.

我这样称呼健身:

reg = xgb.XGBRegressor(n_jobs = 6，n_estimators = 100，max_depth = 5)

reg = xgb.XGBRegressor(n_jobs=6, n_estimators = 100, max_depth= 5)

reg.fit(
   X_train, 
   y_train, 
   eval_metric='rmse',    
   eval_set=[(X_train, y_train), (X_valid, y_valid)],
   verbose=True,
   early_stopping_rounds = 6)

这是我最终得到的:

[71]    validation_0-rmse:1.70071   validation_1-rmse:1.9382
[72]    validation_0-rmse:1.69806   validation_1-rmse:1.93825
[73]    validation_0-rmse:1.69732   validation_1-rmse:1.93803
Stopping. Best iteration:
[67]    validation_0-rmse:1.70768   validation_1-rmse:1.93734

，当我检查所用的验证值时:

and when I check the values of the validation I used :

y_pred_valid = reg.predict(X_valid)
y_pred_valid_df = pd.DataFrame(y_pred_valid)
sqrt(mse(y_valid, y_pred_valid_df[0]))

我知道

1.9373418403889535

如果拟合返回的是最后一次迭代而不是最佳迭代，则应该给出均方根值1.93803，但给出均方根值为1.93734，恰好是最佳分数.

If the fit had return the last iteration instead of the best one it should have given an rmse around 1.93803 but it gave an rmse at 1.93734, exactly the best score.

我通过两种方式再次检查: 我已经按照@Eran Moshe的答案编辑了以下代码

I checked again by two ways: I've edited the code below according to @Eran Moshe answer

y_pred_valid = reg.predict(X_valid, ntree_limit=reg.best_ntree_limit)
y_pred_valid_df = pd.DataFrame(y_pred_valid)
sqrt(mse(y_valid, y_pred_valid_df[0]))

1.9373418403889535

即使我只用68个估算器来进行拟合(知道最佳迭代器是第67个)，也可以确保最后一个是最好的:

and even if I call the fit (knowing the best iter is the 67th) with only 68 estimators so that I'm sure the last one is the best one:

reg = xgb.XGBRegressor(n_jobs=6, n_estimators = 68, max_depth= 5)

reg.fit(
   X_train, 
   y_train, 
   eval_metric='rmse',    
   eval_set=[(X_train, y_train), (X_valid, y_valid)],
   verbose=True,
   early_stopping_rounds = 6)

结果相同:

1.9373418403889535

因此，这似乎导致了这样一个想法，即与文档不同，以及与之相关的大量讨论，得出的结论是，xgboost的拟合在被早期停止回合参数停止时确实给出了最佳迭代，而不是最后一个

So that seems to lead to the idea that, unlike the documentation, and those numerous discussions about it, tell, the fit of xgboost, when stopped by the early stopping round parameter, does give the best iter, not the last one.

我错了吗?如果错了，在哪里，以及如何解释我遇到的行为?

Am I wrong, if so, where, and how do you explain the behavior I met ?

感谢关注

xgboost文档错误吗? (尽早停止回合以及最佳和最后一次迭代) [英] Is the xgboost documentation wrong ? (early stopping rounds and best and last iteration)

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录关闭

xgboost文档错误吗? (尽早停止回合以及最佳和最后一次迭代) [英] Is the xgboost documentation wrong ? (early stopping rounds and best and last iteration)

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录 关闭

登录关闭