XGBOOST:sample_Weights 与 scale_pos_weight [英] XGBOOST: sample_Weights vs scale_pos_weight
问题描述
我有一个高度不平衡的数据集,想知道在哪里考虑权重,因此我试图理解 XGBClassifier
和 中的
参数.如果可以同时使用这两种方法或如何选择这两种方法中的任何一种,希望能直观地解释两者之间的区别.scale_pos_weight
参数之间的区别fit
方法的 >sample_weight
I have a highly unbalanced dataset and am wondering where to account for the weights, and thus am trying to comprehend the difference between scale_pos_weight
argument in XGBClassifier
and the sample_weight
parameter of the fit
method. Would appreciate an intuitive explanation of the difference between the two, if they can be used simultaneously or how either approach is selected.
文档表明scale_pos_weight
:
控制正负权重的平衡..&典型值考虑:总和(负案例)/总和(正案例)
control the balance of positive and negative weights..& typical value to consider: sum(negative cases) / sum(positive cases)
示例:
from xgboost import XGBClassifier
import xgboost as xgb
LR=0.1
NumTrees=1000
xgbmodel=XGBClassifier(booster='gbtree',seed=0,nthread=-1,
gamma=0,scale_pos_weight=14,learning_rate=LR,n_estimators=NumTrees,
max_depth=5,objective='binary:logistic',subsample=1)
xgbmodel.fit(X_train, y_train)
或
from xgboost import XGBClassifier
import xgboost as xgb
LR=0.1
NumTrees=1000
xgbmodel=XGBClassifier(booster='gbtree',seed=0,nthread=-1,
gamma=0,learning_rate=LR,n_estimators=NumTrees,
max_depth=5,objective='binary:logistic',subsample=1)
xgbmodel.fit(X_train, y_train,sample_weight=weights_train)
推荐答案
sample_weight
参数允许您为每个训练示例指定不同的权重.scale_pos_weight
参数允许您为整个示例类(正类")提供权重.
The sample_weight
parameter allows you to specify a different weight for each training example. The scale_pos_weight
parameter lets you provide a weight for an entire class of examples ("positive" class).
这些对应于成本敏感学习的两种不同方法.如果您认为错误分类正面示例(错过癌症患者)的成本对于所有正面示例都是相同的(但比错误分类负面示例的成本更高,例如告诉某人他们实际上没有癌症),那么您可以指定一个通过 scale_pos_weight
为所有正例赋予权重.
These correspond to two different approaches to cost-sensitive learning. If you believe that the cost of misclassifying positive examples (missing a cancer patient) is the same for all positive examples (but more than misclassifying negative ones, e.g. telling someone they have cancer when they actually don't) then you can specify one single weight for all positive examples via scale_pos_weight
.
XGBoost 将标签 = 1 视为正"类.从下面的一段代码可以看出这一点:
XGBoost treats labels = 1 as the "positive" class. This is evident from the following piece of code:
if (info.labels[i] == 1.0f) w *= param_.scale_pos_weight
见 这个问题.
另一种情况是您有与示例相关的成本.一个例子是检测欺诈交易.不仅漏报(错过欺诈交易)比误报(阻止合法交易)的成本更高,而且漏报漏报的成本与被盗的金额成正比.因此,您希望为数量较高的正面(欺诈)示例赋予更大的权重.在这种情况下,您可以使用 sample_weight
参数来指定特定于示例的权重.
The other scenario is where you have example-dependent costs. One example is detecting fraudulent transactions. Not only a false negative (missing a fraudulent transaction) is more costly than a false positive (blocking a legal transaction), but the cost of missing a false negative is proportional to the amount of money being stolen. So you want to give larger weights to positive (fraudulent) examples with higher amounts. In this case, you can use the sample_weight
parameter to specify example-specific weights.
这篇关于XGBOOST:sample_Weights 与 scale_pos_weight的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!