xgboost 覆盖率是如何计算的? [英] How is xgboost cover calculated?

查看:64
本文介绍了xgboost 覆盖率是如何计算的?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

谁能解释一下xgboost R 包中的Cover 列是如何在xgb.model.dt.tree 函数中计算的?

在文档中,它说 Cover 是衡量受拆分影响的观察数量的指标".

当您运行以下代码时,xgboost 文档中给出了此函数,树 0 的节点 0 的 Cover 为 1628.2500.

data(agaricus.train, package='xgboost')#两个数据集都是包含两个项目的列表,一个稀疏矩阵和标签#(标签 = 将要学习的结果列).#稀疏矩阵的每一列都是一种热编码格式的特征.火车 <- agaricus.trainbst <- xgboost(data = train$data, label = train$label, max.depth = 2,eta = 1, nthread = 2, nround = 2,objective = "binary:logistic")#agaricus.test$data@Dimnames[[2]] 表示稀疏矩阵的列名.xgb.model.dt.tree(agaricus.train$data@Dimnames[[2]], 模型 = bst)

训练数据集中有 6513 个观测值,那么谁能解释为什么树 0 的节点 0 的 Cover 是这个数字的四分之一 (1628.25)?

另外,树 1 的节点 1 的 Cover 是 788.852 - 这个数字是如何计算的?

任何帮助将不胜感激.谢谢.

解决方案

Cover 在 xgboost 中定义为:

<块引用>

分类为训练数据的二阶梯度之和叶子,如果是平方损失,这简单地对应于该分支中的实例.在树的更深处有一个节点,降低这个指标将是

https://github.com/dmlc/xgboost/blob/f5659e17d5200bd7471a2e735177a81cb8d3012b/R-package/man/xgb.plot.tree.Rd没有特别详细的记录....

为了计算coverage,我们需要知道树中那个点的预测,以及关于损失函数的二阶导数.

幸运的是,在您的示例中,0-0 节点中每个数据点(其中 6513 个)的预测值为 0.5.这是一个全局默认设置,您在 t=0 时的第一个预测是 0.5.

<块引用>

base_score [ default=0.5 ] 所有的初始预测分数实例,全局偏差

http://xgboost.readthedocs.org/en/latest/parameter.html>

二元逻辑(即您的目标函数)的梯度是 p-y,其中 p = 您的预测,y = 真实标签.

因此,hessian(我们需要它)是 p*(1-p).注意:Hessian 可以在没有 y(真实标签)的情况下确定.

所以(带回家):

6513 * (.5) * (1 - .5) = 1628.25

在第二棵树中,此时的预测不再都是 0.5,sp 让我们得到一棵树后的预测

p = predict(bst,newdata = train$data, ntree=1)头(p)[1] 0.8471184 0.1544077 0.1544077 0.8471184 0.1255700 0.1544077sum(p*(1-p)) # 该节点的hessians的总和,(根节点有所有数据)[1] 788.8521

请注意,对于线性(平方误差)回归,hessian 始终为 1,因此封面指示该叶子中有多少个示例.

最大的收获是cover是由目标函数的hessian定义的.关于获得梯度和二元逻辑函数的 Hessian 的信息很多.

这些幻灯片有助于了解他为什么使用粗麻布作为权重,还解释了 xgboost 与标准树的分割方式有何不同.https://homes.cs.washington.edu/~tqchen/pdf/BoostedTree.pdf

Could someone explain how the Cover column in the xgboost R package is calculated in the xgb.model.dt.tree function?

In the documentation it says that Cover "is a metric to measure the number of observations affected by the split".

When you run the following code, given in the xgboost documentation for this function, Cover for node 0 of tree 0 is 1628.2500.

data(agaricus.train, package='xgboost')

#Both dataset are list with two items, a sparse matrix and labels
#(labels = outcome column which will be learned).
#Each column of the sparse Matrix is a feature in one hot encoding format.
train <- agaricus.train

bst <- xgboost(data = train$data, label = train$label, max.depth = 2,
               eta = 1, nthread = 2, nround = 2,objective = "binary:logistic")

#agaricus.test$data@Dimnames[[2]] represents the column names of the sparse matrix.
xgb.model.dt.tree(agaricus.train$data@Dimnames[[2]], model = bst)

There are 6513 observations in the train dataset, so can anyone explain why Cover for node 0 of tree 0 is a quarter of this number (1628.25)?

Also, Cover for the node 1 of tree 1 is 788.852 - how is this number calculated?

Any help would be much appreciated. Thanks.

解决方案

Cover is defined in xgboost as:

the sum of second order gradient of training data classified to the leaf, if it is square loss, this simply corresponds to the number of instances in that branch. Deeper in the tree a node is, lower this metric will be

https://github.com/dmlc/xgboost/blob/f5659e17d5200bd7471a2e735177a81cb8d3012b/R-package/man/xgb.plot.tree.Rd Not particularly well documented....

In order to calculate the cover, we need to know the predictions at that point in the tree, and the 2nd derivative with respect to the loss function.

Lucky for us, the prediction for every data point (6513 of them) in the 0-0 node in your example is .5. This is a global default setting whereby your first prediction at t=0 is .5.

base_score [ default=0.5 ] the initial prediction score of all instances, global bias

http://xgboost.readthedocs.org/en/latest/parameter.html

The gradient of binary logistic (which is your objective function) is p-y, where p = your prediction, and y = true label.

Thus, The hessian (which we need for this) is p*(1-p). Note: the Hessian can be determined without y, the true labels.

So (bringing it home) :

6513 * (.5) * (1 - .5) = 1628.25

In the second tree, the predictions at that point are no longer all .5,sp lets get the predictions after one tree

p = predict(bst,newdata = train$data, ntree=1)

head(p)
[1] 0.8471184 0.1544077 0.1544077 0.8471184 0.1255700 0.1544077

sum(p*(1-p))  # sum of the hessians in that node,(root node has all data)
[1] 788.8521

Note , for linear (squared error) regression the hessian is always one, so the cover indicates how many examples are in that leaf.

The big takeaway is that cover is defined by the hessian of the objective function. Lots of info out there in terms of getting to the gradient, and hessian of the binary logistic function.

These slides are helpful is seeing why he uses hessians as a weighting, and also explain how xgboost splits differently from standard trees. https://homes.cs.washington.edu/~tqchen/pdf/BoostedTree.pdf

这篇关于xgboost 覆盖率是如何计算的?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆