XGBOOST:sample_Weights 与 scale_pos_weight [英] XGBOOST: sample_Weights vs scale_pos_weight

查看:239
本文介绍了XGBOOST:sample_Weights 与 scale_pos_weight的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个高度不平衡的数据集,想知道在哪里考虑权重,因此我试图理解 XGBClassifier 中的 scale_pos_weight 参数之间的区别fit 方法的 >sample_weight 参数.如果可以同时使用这两种方法或如何选择这两种方法中的任何一种,希望能直观地解释两者之间的区别.

I have a highly unbalanced dataset and am wondering where to account for the weights, and thus am trying to comprehend the difference between scale_pos_weight argument in XGBClassifier and the sample_weight parameter of the fit method. Would appreciate an intuitive explanation of the difference between the two, if they can be used simultaneously or how either approach is selected.

文档表明scale_pos_weight:

控制正负权重的平衡..&典型值考虑:总和(负案例)/总和(正案例)

control the balance of positive and negative weights..& typical value to consider: sum(negative cases) / sum(positive cases)

示例:

from xgboost import XGBClassifier
import xgboost as xgb
LR=0.1
NumTrees=1000
xgbmodel=XGBClassifier(booster='gbtree',seed=0,nthread=-1,
                       gamma=0,scale_pos_weight=14,learning_rate=LR,n_estimators=NumTrees,
                      max_depth=5,objective='binary:logistic',subsample=1)
xgbmodel.fit(X_train, y_train)

from xgboost import XGBClassifier
import xgboost as xgb
LR=0.1
NumTrees=1000
xgbmodel=XGBClassifier(booster='gbtree',seed=0,nthread=-1,
                       gamma=0,learning_rate=LR,n_estimators=NumTrees,
                      max_depth=5,objective='binary:logistic',subsample=1)
xgbmodel.fit(X_train, y_train,sample_weight=weights_train)

推荐答案

sample_weight 参数允许您为每个训练示例指定不同的权重.scale_pos_weight 参数允许您为整个示例类(正类")提供权重.

The sample_weight parameter allows you to specify a different weight for each training example. The scale_pos_weight parameter lets you provide a weight for an entire class of examples ("positive" class).

这些对应于成本敏感学习的两种不同方法.如果您认为错误分类正面示例(错过癌症患者)的成本对于所有正面示例都是相同的(但比错误分类负面示例的成本更高,例如告诉某人他们实际上没有癌症),那么您可以指定一个通过 scale_pos_weight 为所有正例赋予权重.

These correspond to two different approaches to cost-sensitive learning. If you believe that the cost of misclassifying positive examples (missing a cancer patient) is the same for all positive examples (but more than misclassifying negative ones, e.g. telling someone they have cancer when they actually don't) then you can specify one single weight for all positive examples via scale_pos_weight.

XGBoost 将标签 = 1 视为正"类.从下面的一段代码可以看出这一点:

XGBoost treats labels = 1 as the "positive" class. This is evident from the following piece of code:

if (info.labels[i] == 1.0f) w *= param_.scale_pos_weight

这个问题.

另一种情况是您有与示例相关的成本.一个例子是检测欺诈交易.不仅漏报(错过欺诈交易)比误报(阻止合法交易)的成本更高,而且漏报漏报的成本与被盗的金额成正比.因此,您希望为数量较高的正面(欺诈)示例赋予更大的权重.在这种情况下,您可以使用 sample_weight 参数来指定特定于示例的权重.

The other scenario is where you have example-dependent costs. One example is detecting fraudulent transactions. Not only a false negative (missing a fraudulent transaction) is more costly than a false positive (blocking a legal transaction), but the cost of missing a false negative is proportional to the amount of money being stolen. So you want to give larger weights to positive (fraudulent) examples with higher amounts. In this case, you can use the sample_weight parameter to specify example-specific weights.

这篇关于XGBOOST:sample_Weights 与 scale_pos_weight的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆