参数“权重"如何?(DMatrix) 用于梯度提升程序 (xgboost)? [英] How is the parameter "weight" (DMatrix) used in the gradient boosting procedure (xgboost)?

查看:92
本文介绍了参数“权重"如何?(DMatrix) 用于梯度提升程序 (xgboost)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在 xgboost 中,可以为 DMatrix 设置参数 weight.这显然是一个权重列表,其中每个值都是相应样本的权重.我找不到有关如何在梯度提升过程中实际使用这些权重的任何信息.它们与 eta 相关吗?

In xgboost it is possible to set the parameter weight for a DMatrix. This is apparently a list of weights wherein each value is a weight for a corresponding sample. I can't find any information on how these weights are actually used in the gradient boosting procedure. Are they related to eta ?

例如,如果我将所有样本的 weight 设置为 0.3 并将 eta 设置为 1,这是否与设置 eta 相同到 0.3 和 weight 到 1?

For example, if I would set weight to 0.3 for all samples and eta to 1, would this be the same as setting eta to 0.3 and weight to 1?

推荐答案

xgboost 允许在构建 DMatrix 期间进行实例加权,正如您所指出的.这个权重与实例直接绑定,并在整个训练过程中随它移动.因此,它包含在梯度和 Hessian 的计算中,并直接影响 xgboost 模型的分割点和训练.

xgboost allows for instance weighting during the construction of the DMatrix, as you noted. This weight is directly tied the instance and travels with it throughout the entire training. Thus it is included in the calculations of the gradients and hessians, and directly impacts the split points and traing of an xgboost model.

请参阅此处这里

实例权重文件

XGBoost 支持为每个实例提供一个权重来区分实例的重要性.例如,如果我们提供一个实例示例中train.txt"文件的权重文件如下:

XGBoost supports providing each instance an weight to differentiate the importance of instances. For example, if we provide an instance weight file for the "train.txt" file in the example as below:

train.txt.weight

train.txt.weight

1

0.5

0.5

1

0.5

这意味着 XGBoost 会更加强调第一个和第四个实例,即训练时的正实例.这配置与配置组信息类似.如果实例文件名为xxx",XGBoost 会检查是否存在同目录下名为xxx.weight"的文件,如果有,将在训练模型时使用权重.

It means that XGBoost will emphasize more on the first and fourth instance, that is to say positive instances while training. The configuration is similar to configuring the group information. If the instance file name is "xxx", XGBoost will check whether there is a file named "xxx.weight" in the same directory and if there is, will use the weights while training models.

eta

eta 只是告诉 xgboost 最后一棵树训练到集成中的混合程度.衡量集成在每次迭代中的贪婪程度.

eta simply tells xgboost how much the blend the last tree trained into the ensemble. A measure of how greedy the ensemble should be at each iteration.

例如,如果我将所有样本的 weight 设置为 0.3,而 eta为 1,这是否与将 eta 设置为 0.3 并将 weight 设置为相同1?

For example, if I would set weight to 0.3 for all samples and eta to 1, would this be the same as setting eta to 0.3 and weight to 1?

  • 所有实例的常量 weight 为 1 是默认值,因此将所有实例的常量更改为 0.3 仍然是相同的权重,因此这不会影响事情太多了.但是,将 eta 从 .3 设置为 1,会使训练更加激进.

    • A constant weight of 1 for all instances is the default, so changing that to a constant of .3 for all instances would still be equal weighting, so this shouldn't impact things too much. However, setting eta up to 1, from .3, would make the training much more aggressive.

      这篇关于参数“权重"如何?(DMatrix) 用于梯度提升程序 (xgboost)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆