如何在xgboost中访问单个决策树的权重? [英] How to access weighting of indiviual decision trees in xgboost?

查看:254
本文介绍了如何在xgboost中访问单个决策树的权重?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用xgboost进行排名

I'm using xgboost for ranking with

param = {'objective':'rank:pairwise', 'booster':'gbtree'}

据我了解,通过计算获知决策的加权总和可以实现梯度增强树木。如何获得分配给每个学习的助推器的权重?我想在训练后尝试对权重进行后处理,以加快预测步骤,但我不知道如何获取各个权重。
使用 dump_model()时,可以在创建的文件中看到不同的决策树,但是没有存储任何权重。
在API中,我找不到合适的函数。还是可以使用收缩参数 eta 手动计算权重?

As I understand gradient boosting works by calculating the weighted sum of the learned decision trees. How can I access the weights that are assigned to each learned booster? I wanted to try to post-process the weights after training to speed up the prediction step but I don't know how to get the individual weights. When using dump_model(), the different decision trees can be seen in the created file but no weighting is stored there. In the API I haven't found a suitable function. Or can I calculate the weights by hand with the shrinkage parameter eta ?

推荐答案

每棵树都赋予相同的权重 eta ,总的预测是每棵树的预测总和,就像您说的那样。

Each tree is given the same weight eta and the overall prediction is the sum of the predictions of each tree, as you say.

您可能希望前面的树比后面的树具有更大的权重,但这不是必须的,因为在每棵树之后更新响应的方式都是如此。这是一个玩具示例:

You'd perhaps expect that the earlier trees are given more weight than the latter trees but that's not necessary, due to the way the response is updated after every tree. Here's a toy example:

假设我们有5个观测值,响应分别为10、20、30、40、50。第一棵树被构建并给出12、18的预测,27、39、54。

Suppose we have 5 observations, with responses 10, 20, 30, 40, 50. The first tree is built and gives predictions of 12, 18, 27, 39, 54.

现在,如果 eta = 1,则响应变量传递到下一棵树将为-2、2、3、1,-4(即预测值与真实响应之间的差)。然后,下一棵树将尝试学习第一棵树未捕获的噪声。如果 = 2,则来自两棵树的预测之和将给出模型的最终预测。

Now, if eta = 1, the response variables passed to the next tree will be -2, 2, 3, 1, -4 (i.e. the difference between the prediction and the true response). The next tree will then try to learn the 'noise' that wasn't captured by the first tree. If nrounds = 2, then the sum of the predictions from the two trees will give the final prediction of the model.

如果相反, eta = 0.1,则所有树的预测将按 eta 比例缩小,因此第一棵树将而是预测 1.2、1.8、2.7、3.9、5.4。传递给下一棵树的响应变量将具有值8.8、18.2、27.3、36.1、44.6(缩放后的预测与真实响应之间的差)。第二轮然后使用这些响应值来构建另一棵树-再次进行预测由 eta 缩放。因此,树2预测说7、18、25、40、40,一旦缩放,它们将变为0.7、1.8、2.5、4.0、4.0。和以前一样,第三棵树将被传递这些值与前一棵树的响应变量之间的差(即8.1、16.4、24.8、32.1。40.6)。同样,所有树的预测总和将给出最终预测。

If instead eta = 0.1, all trees will have their predictions scaled down by eta, so the first tree will instead 'predict' 1.2, 1.8, 2.7, 3.9, 5.4. The response variable passed to the next tree will then have values 8.8, 18.2, 27.3, 36.1, 44.6 (the difference between the scaled prediction and the true response) The second round then uses these response values to build another tree - and again the predictions are scaled by eta. So tree 2 predicts say, 7, 18, 25, 40, 40, which, once scaled, become 0.7, 1.8, 2.5, 4.0, 4.0. As before, the third tree will be passed the difference between these values and the previous tree's response variable (so 8.1, 16.4, 24.8, 32.1. 40.6). Again, the sum of the predictions from all trees will give the final prediction.

很明显,当 eta = 0.1且 base_score 为0,则至少需要10轮才能获得接近明智的预测。通常,您绝对需要至少1 / eta 次回合,并且通常要更多。

Clearly when eta = 0.1, and base_score is 0, you'll need at least 10 rounds to get a prediction that's anywhere near sensible. In general, you need an absolute minimum of 1/eta rounds and typically many more.

使用的原理一个小的 eta 在于该模型受益于朝预测进行一些小步骤,而不是让树1承担大部分工作。有点像结晶-缓慢冷却,您会得到更大,更好的晶体。缺点是您需要增加 nrounds ,从而增加算法的运行时间。

The rationale for using a small eta is that the model benefits from taking small steps towards the prediction rather than making tree 1 do the majority of the work. It's a bit like crystallisation - cool slowly and you get bigger, better crystals. The downside is you need to increase nrounds, thus increasing the runtime of the algorithm.

这篇关于如何在xgboost中访问单个决策树的权重?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆