xgboost 质量是如何计算的? [英] How is xgboost quality calculated?

查看:28
本文介绍了xgboost 质量是如何计算的?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

谁能解释一下xgb.model.dt.tree函数中xgboost R包中的Quality列是如何计算的?

Could someone explain how the Quality column in the xgboost R package is calculated in the xgb.model.dt.tree function?

在文档中它说Quality是与此特定节点中的分裂相关的增益".

In the documentation it says that Quality "is the gain related to the split in this specific node".

当您运行此函数的 xgboost 文档中给出的以下代码时,树 0 的节点 0 的 Quality 为 4000.53,但我计算的 Gain 为 2002.848

When you run the following code, given in the xgboost documentation for this function, Quality for node 0 of tree 0 is 4000.53, yet I calculate the Gain as 2002.848

data(agaricus.train, package='xgboost')

train <- agarics.train

X = train$data
y = train$label

bst <- xgboost(data = train$data, label = train$label, max.depth = 2,
               eta = 1, nthread = 2, nround = 2,objective = "binary:logistic")

xgb.model.dt.tree(agaricus.train$data@Dimnames[[2]], model = bst)

p = rep(0.5,nrow(X))

L = which(X[,'odor=none']==0)
R = which(X[,'odor=none']==1)

pL = p[L]
pR = p[R]

yL = y[L]
yR = y[R]

GL = sum(pL-yL)
GR = sum(pR-yR)
G = sum(p-y)

HL = sum(pL*(1-pL))
HR = sum(pR*(1-pR))
H = sum(p*(1-p))

gain = 0.5 * (GL^2/HL+GR^2/HR-G^2/H)

gain

我了解 Gain 由以下公式给出:

I understand that Gain is given by the following formula:

由于我们使用对数损失,G 是 py 的总和,H 是 p(1-p) 的总和 - 在这种情况下,gamma 和 lambda 是都是零.

Since we are using log loss, G is the sum of p-y and H is the sum of p(1-p) - gamma and lambda in this instance are both zero.

谁能指出我哪里出错了?

Can anyone identify where I am going wrong?

推荐答案

好的,我想我已经解决了.reg_lambda 的值不是文档中给出的默认值 0,但实际上是 1(来自 param.h)

OK, I think I've worked it out. The value for reg_lambda is not 0 by default as given in the documentation, but is actually 1 (from param.h)

此外,在计算增益时似乎没有应用一半的系数,因此质量列是您期望的两倍.最后,我也不认为 gamma(也称为 min_split_loss)应用于此计算(来自 update_hitmaker-inl.hpp)

Also, it appears that the factor of a half is not applied when calculating the gain, so the Quality column is double what you would expect. Lastly, I also don't think gamma (also called min_split_loss) is applied to this calculation either (from update_hitmaker-inl.hpp)

相反,gamma 用于确定是否调用修剪,但并未反映在增益计算本身中,正如文档所建议的那样.

Instead, gamma is used to determine whether to invoke pruning, but is not reflected in the gain calculation itself, as the documentation suggests.

如果您应用这些更改,您确实会得到 4000.53 作为树 0 的节点 0 的 Quality,如原始问题所示.我会将此作为问题提交给 xgboost 人员,以便相应地更改文档.

If you apply these changes, you do indeed get 4000.53 as the Quality for node 0 of tree 0, as in the original question. I'll raise this as an issue to the xgboost guys, so the documentation can be changed accordingly.

这篇关于xgboost 质量是如何计算的?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆