xgboost多类工作中的base_score有什么用? [英] What is the use of base_score in xgboost multiclass working?

查看:205
本文介绍了xgboost多类工作中的base_score有什么用?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试探索Xgboost二进制分类以及多分类的工作.对于二元类,我观察到 base_score 被视为开始概率,并且在计算 Gain Cover 时也显示出主要影响.

对于多类别的情况,我无法弄清 base_score 参数的重要性,因为它向我展示了 Gain Cover的值相同表示base_score的不同(任意)值.

在为多类别(即 2 * p *(1-p))计算 cover 时,我也无法找出为什么 2的原因

有人可以帮我这两个部分吗?

解决方案

为回答您的问题,让我们看一下使用 multi:softmax 目标(例如6)在xgboost中进行多类分类的真正作用课.

说,您想训练一个指定 num_boost_round = 5 的分类器.您希望xgboost为您训练多少棵树?正确答案是30棵树.原因是因为softmax期望每个训练行具有 num_classes = 6 个不同的分数,以便xgboost可以计算渐变/hessianw.r.t.这6个分数中的每个分数,并使用它们为每个分数构建新树(有效地更新6个并行模型,以便每个样本输出6个更新分数).

为了让xgboost分类器输出每个样本的最终6个值,例如从测试集中你需要调用 bst.predict(xg_test, output_margin=True) (其中 bst 是你的分类器,xg_test 是例如 test放).常规 bst.predict(xg_test)的输出实际上与在 bst.predict(xg_test,output_margin = True)中选择最高值为6的类相同./p>

如果您有兴趣的话,可以使用 bst.trees_to_dataframe()函数查看所有树(其中 bst 是您训练有素的分类器).

现在有一个问题,在 multi:softmax 情况下, base_score 会做什么.答案是-在添加任何树之前,将其添加为6个班级分数中每一个分数的起始分数.所以如果你,例如应用 base_score = 42.,您将可以观察到 bst.predict(xg_test,output_margin = True)中的所有值也会增加 42 .同时,对于 softmax ,所有类的分数均等增加不会改变任何内容,因此,对于 multi:softmax 应用 base_score的情况与0不同没有任何可见效果.

将此行为与二进制分类进行比较.虽然与具有2个类的 multi:softmax 几乎相同,但最大的不同是xgboost只会尝试为1类产生1分,而0类的得分等于 0.0 .因此,当您在二进制分类中使用 base_score 时,仅将其添加到1类的得分中,从而增加了1类的起始预测概率.通过多个基本分数(每堂课一个),这是使用 base_score 不能做到的.取而代之的是,您可以使用应用于训练集的 set_base_margin 功能,但是对于默认的 predict ,它不能很方便地工作,因此之后,您需要始终使用它使用 output_margin = True 并添加与您在 set_base_margin 中用于训练数据的值相同的值(如果要在多类中使用 set_base_margin 情况下,您需要按照建议的此处)展平边距值.

>

所有操作方式的示例:

 将numpy导入为np将xgboost导入为xgb火车= 1000测试= 2F = 10def gen_data(M):np_train_features = np.random.rand(M,F)np_train_labels = np.random.binomial(2,np_train_features [:,0])返回xgb.DMatrix(np_train_features,label = np_train_labels)def regenerate_data():np.random.seed(1)返回 gen_data(TRAIN), gen_data(TEST)param = {}param ['objective'] ='multi:softmax'参数['eta'] = 0.001param ['max_depth'] = 1param ['nthread'] = 4param ['num_class'] = 3def sbm(xg_data, original_scores):xg_data.set_base_margin(np.array(original_scores * xg_data.num_row()).reshape(-1,1))num_round = 3打印(#1.没有base_score,没有set_base_margin")xg_train,xg_test = regenerate_data()bst = xgb.train(参数,xg_train,num_round)打印(bst.predict(xg_test,output_margin = True))打印(bst.predict(xg_test))print(在这种情况下,很容易看到所有分数/边距最初都添加了0.5,出于某些奇怪的原因,这是base_score的默认值,但实际上并没有影响任何东西,因此没有人关心."打印()bst1 = bst打印(#2.使用base_score")xg_train,xg_test = regenerate_data()param ['base_score'] = 5.8bst = xgb.train(参数,xg_train,num_round)打印(bst.predict(xg_test,output_margin = True))打印(bst.predict(xg_test))print(在这种情况下,所有分数/边距最初都添加了5.8.与以前的情况相比,它并没有真正改变任何东西.")打印()bst2 = bst打印(#3.使用非常大的base_score并提高数值精度")xg_train,xg_test = regenerate_data()param ['base_score'] = 5.8e10bst = xgb.train(参数,xg_train,num_round)打印(bst.predict(xg_test,output_margin = True))打印(bst.predict(xg_test))print(在这种情况下,所有分数/边距都添加了太大的数字,xgboost认为所有概率都相等,因此选择0类作为预测.")print(但是训练实际上很好-此处只是预测会受到影响.如果您为测试集设置正常的基础边距,您可以看到(也可以查看bst.trees_to_dataframe()).")xg_train,xg_test = regenerate_data()#如果我们不重新生成数据帧,则xgboost似乎正在缓存它或以其他方式记住它没有base_margins,结果将有所不同.sbm(xg_test,[0.1,0.1,0.1])打印(bst.predict(xg_test,output_margin = True))打印(bst.predict(xg_test))打印()bst3 = bst打印(#4.使用set_base_margin进行训练")xg_train,xg_test = regenerate_data()#仅在不应用set_base_margin的情况下在训练/测试中使用.#训练有素的模型的奇特之处在于即使使用以下方法训练该值也会记住该值#具有set_base_margin的数据集.在这种情况下,如果#并且仅当传递给`bst.predict`的测试集未应用`set_base_margin`时.param ['base_score'] = 4.2sbm(xg_train,[-0.4,0.,0.8])bst = xgb.train(参数,xg_train,num_round)sbm(xg_test,[-0.4,0.,0.8])打印(bst.predict(xg_test,output_margin = True))打印(bst.predict(xg_test))print(正在工作-由于eta值较低且助推轮数较少,因此添加到类别倾斜预测的基本边距值.")print(如果我们不为预测"输入设置基本边距,它将使用base_score来开始所有分数.课程分数.")xg_train,xg_test = regenerate_data()#重新生成测试,并且不设置基本边距值打印(bst.predict(xg_test,output_margin = True))打印(bst.predict(xg_test))打印()bst4 = bstprint("树 bst1、bst2、bst3 几乎相同,因为它们的训练方式没有区别.但 bst4 是不同的.")打印(bst1.trees_to_dataframe().iloc [1,])打印()打印(bst2.trees_to_dataframe().iloc [1,])打印()打印(bst3.trees_to_dataframe().iloc [1,])打印()打印(bst4.trees_to_dataframe().iloc [1,]) 

其输出如下:

 #1.没有base_score,没有set_base_margin[[0.50240415 0.5003637 0.49870378][0.49863306 0.5003637 0.49870378][0.1.]容易看出,在这种情况下,所有分数/边距最初都添加了0.5,出于某些奇怪的原因,这是base_score的默认值,但它并没有真正影响任何东西,因此没有人关心.#2.使用base_score[[5.8024044 5.800364 5.798704][5.798633 5.800364 5.798704]][0.1.]在这种情况下,所有分数/边距最初都添加了5.8.与以前的案例相比,它并没有真正改变任何东西.#3.使用非常大的base_score并提高数值精度[[5.8e + 10 5.8e + 10 5.8e + 10][5.8e + 10 5.8e + 10 5.8e + 10][0.0].在这种情况下,所有分数/边距都添加了太大的数字,xgboost认为所有概率都相等,因此选择0类作为预测.但是培训实际上很好-只是预测在这里受到影响.如果为测试集设置了正常的基础边距,则可以看到(也可以查看bst.trees_to_dataframe()).[[0.10240632 0.10036398 0.09870315][0.09863247 0.10036398 0.09870315]][0.1.]#4.使用set_base_margin进行培训[[-0.39458954 0.00102317 0.7973728][-0.40044016 0.00102317 0.7973728]][2.2.]工作-由于eta低且助攻轮次数少,基本边距值添加到了类别倾斜预测中.如果我们没有为预测"输入设置基本边距,它将使用base_score来开始所有分数.Bizzare,对不对?但是话又说回来,如果我们在所有班级的分数中添加相同的值,那么在此处添加的内容并没有太大差异.[[4.2054105 4.201023 4.1973724][4.1995597 4.201023 4.1973724]][0.1.]树bst1,bst2,bst3几乎相同,因为它们的训练方式没有差异.bst4是不同的.树0节点1编号0-1功能叶分割NaN是 NaN无NaN缺少NaN收益0.000802105封面157.333名称:1,dtype:对象树0节点1编号0-1功能叶分割NaN是的无NaN缺少 NaN收益0.000802105封面157.333名称:1,dtype:对象树0节点1编号0-1功能叶分割NaN是的无NaN缺少NaN收益0.000802105封面157.333名称:1,dtype:对象树0节点1编号0-1功能叶分割NaN是的无NaN缺少NaN收益0.00180733封面100.858名称:1,dtype:对象 

I am trying to explore the working of Xgboost binary classification as well as for multi-class. In case of binary class, i observed that base_score is considered as starting probability and it also showed major impact while calculating Gain and Cover.

In case of multi-class, i am not able to figure out the importance of base_score parameter because it showed me the same value of Gain and Cover for different(any) values of base_score.

Also i am unable to find out why factor of 2 is there while calculating cover for multi-class i.e. 2*p*(1-p)

Can someone help me on these two parts?

解决方案

To answer your question let's look what does multi-class classification really does in xgboost using multi:softmax objective and, say, 6 classes.

Say, you want to train a classifier specifying num_boost_round=5. How many trees would you expect xgboost to train for you? Correct answer is 30 trees. The reason is because softmax expecting for each training row to have num_classes=6 different scores, so that xgboost can compute gradients/hessian w.r.t. each of these 6 scores and use them to build a new tree for each of the scores (effectively updating 6 parallel models in order to output 6 updated scores per sample).

In order to ask xgboost classifier output the final 6 values for each sample e.g. from test set you will need to call bst.predict(xg_test, output_margin=True) (where bst is your classifier and xg_test is e.g. test set). The output of regular bst.predict(xg_test) is effectively same as picking the class with the highest value of 6 in bst.predict(xg_test, output_margin=True).

You can look at all the trees using bst.trees_to_dataframe() function if you are interested (where bst is your trained classifier).

Now to the question what does base_score do in multi:softmax case. Answer is - it is added as a starting score for each of 6 classes' scores before any trees were added. So if you, e.g. apply base_score=42. you will be able to observe that all values in bst.predict(xg_test, output_margin=True) will also increase by 42. In the same time for softmax increasing scores for all classes by equal amount doesn't change anything, so because of that in the case of multi:softmax applying base_score different from 0 doesn't have any visible effect.

Compare this behavior to binary classification. While almost same as multi:softmax with 2 classes, the big difference is that xgboost is only trying to produce 1 score for class 1, leaving score for class 0 equal to 0.0. Because of that when you use base_score in binary classification it is only added to the score of class 1 thus increasing starting prediction probability for class 1. In theory with multiple classes it would be meaningful to e.g. pass multiple base scores (one per class), which you can't do using base_score. Instead of that you can use set_base_margin functionality applied to the training set, but it is not working very conveniently with default predict, so after that you'll need to always use it with output_margin=True and adding same values as ones you used in set_base_margin for your training data (if you want to use set_base_margin in multi-class case you'll need to flatten the margin values as suggested here).

Example of how it all works:

import numpy as np
import xgboost as xgb
TRAIN = 1000
TEST = 2
F = 10

def gen_data(M):
    np_train_features = np.random.rand(M, F)
    np_train_labels = np.random.binomial(2, np_train_features[:,0])
    return xgb.DMatrix(np_train_features, label=np_train_labels)

def regenerate_data():
    np.random.seed(1)
    return gen_data(TRAIN), gen_data(TEST)

param = {}
param['objective'] = 'multi:softmax'
param['eta'] = 0.001
param['max_depth'] = 1
param['nthread'] = 4
param['num_class'] = 3


def sbm(xg_data, original_scores):
    xg_data.set_base_margin(np.array(original_scores * xg_data.num_row()).reshape(-1, 1))

num_round = 3

print("#1. No base_score, no set_base_margin")
xg_train, xg_test = regenerate_data()
bst = xgb.train(param, xg_train, num_round)
print(bst.predict(xg_test, output_margin=True))
print(bst.predict(xg_test))
print("Easy to see that in this case all scores/margins have 0.5 added to them initially, which is default value for base_score here for some bizzare reason, but it doesn't really affect anything, so no one cares.")
print()
bst1 = bst

print("#2. Use base_score")
xg_train, xg_test = regenerate_data()
param['base_score'] = 5.8
bst = xgb.train(param, xg_train, num_round)
print(bst.predict(xg_test, output_margin=True))
print(bst.predict(xg_test))
print("In this case all scores/margins have 5.8 added to them initially. And it doesn't really change anything compared to previous case.")
print()
bst2 = bst

print("#3. Use very large base_score and screw up numeric precision")
xg_train, xg_test = regenerate_data()
param['base_score'] = 5.8e10
bst = xgb.train(param, xg_train, num_round)
print(bst.predict(xg_test, output_margin=True))
print(bst.predict(xg_test))
print("In this case all scores/margins have too big number added to them and xgboost thinks all probabilities are equal so picks class 0 as prediction.")
print("But the training actually was fine - only predict is being affect here. If you set normal base margins for test set you can see (also can look at bst.trees_to_dataframe()).")
xg_train, xg_test = regenerate_data() # if we don't regenerate the dataframe here xgboost seems to be either caching it or somehow else remembering that it didn't have base_margins and result will be different.
sbm(xg_test, [0.1, 0.1, 0.1])
print(bst.predict(xg_test, output_margin=True))
print(bst.predict(xg_test))
print()
bst3 = bst

print("#4. Use set_base_margin for training")
xg_train, xg_test = regenerate_data()
# only used in train/test whenever set_base_margin is not applied.
# Peculiar that trained model will remember this value even if it was trained with
# dataset which had set_base_margin. In that case this base_score will be used if
# and only if test set passed to `bst.predict` didn't have `set_base_margin` applied to it.
param['base_score'] = 4.2
sbm(xg_train, [-0.4, 0., 0.8])
bst = xgb.train(param, xg_train, num_round)
sbm(xg_test, [-0.4, 0., 0.8])
print(bst.predict(xg_test, output_margin=True))
print(bst.predict(xg_test))
print("Working - the base margin values added to the classes skewing predictions due to low eta and small number of boosting rounds.")
print("If we don't set base margins for `predict` input it will use base_score to start all scores with. Bizzare, right? But then again, not much difference on what to add here if we are adding same value to all classes' scores.")
xg_train, xg_test = regenerate_data() # regenerate test and don't set the base margin values
print(bst.predict(xg_test, output_margin=True))
print(bst.predict(xg_test))
print()
bst4 = bst

print("Trees bst1, bst2, bst3 are almost identical, because there is no difference in how they were trained. bst4 is different though.")
print(bst1.trees_to_dataframe().iloc[1,])
print()
print(bst2.trees_to_dataframe().iloc[1,])
print()
print(bst3.trees_to_dataframe().iloc[1,])
print()
print(bst4.trees_to_dataframe().iloc[1,])

The output for this is the following:

#1. No base_score, no set_base_margin
[[0.50240415 0.5003637  0.49870378]
 [0.49863306 0.5003637  0.49870378]]
[0. 1.]
Easy to see that in this case all scores/margins have 0.5 added to them initially, which is default value for base_score here for some bizzare reason, but it doesn't really affect anything, so no one cares.

#2. Use base_score
[[5.8024044 5.800364  5.798704 ]
 [5.798633  5.800364  5.798704 ]]
[0. 1.]
In this case all scores/margins have 5.8 added to them initially. And it doesn't really change anything compared to previous case.

#3. Use very large base_score and screw up numeric precision
[[5.8e+10 5.8e+10 5.8e+10]
 [5.8e+10 5.8e+10 5.8e+10]]
[0. 0.]
In this case all scores/margins have too big number added to them and xgboost thinks all probabilities are equal so picks class 0 as prediction.
But the training actually was fine - only predict is being affect here. If you set normal base margins for test set you can see (also can look at bst.trees_to_dataframe()).
[[0.10240632 0.10036398 0.09870315]
 [0.09863247 0.10036398 0.09870315]]
[0. 1.]

#4. Use set_base_margin for training
[[-0.39458954  0.00102317  0.7973728 ]
 [-0.40044016  0.00102317  0.7973728 ]]
[2. 2.]
Working - the base margin values added to the classes skewing predictions due to low eta and small number of boosting rounds.
If we don't set base margins for `predict` input it will use base_score to start all scores with. Bizzare, right? But then again, not much difference on what to add here if we are adding same value to all classes' scores.
[[4.2054105 4.201023  4.1973724]
 [4.1995597 4.201023  4.1973724]]
[0. 1.]

Trees bst1, bst2, bst3 are almost identical, because there is no difference in how they were trained. bst4 is different though.
Tree                 0
Node                 1
ID                 0-1
Feature           Leaf
Split              NaN
Yes                NaN
No                 NaN
Missing            NaN
Gain       0.000802105
Cover          157.333
Name: 1, dtype: object

Tree                 0
Node                 1
ID                 0-1
Feature           Leaf
Split              NaN
Yes                NaN
No                 NaN
Missing            NaN
Gain       0.000802105
Cover          157.333
Name: 1, dtype: object

Tree                 0
Node                 1
ID                 0-1
Feature           Leaf
Split              NaN
Yes                NaN
No                 NaN
Missing            NaN
Gain       0.000802105
Cover          157.333
Name: 1, dtype: object

Tree                0
Node                1
ID                0-1
Feature          Leaf
Split             NaN
Yes               NaN
No                NaN
Missing           NaN
Gain       0.00180733
Cover         100.858
Name: 1, dtype: object

这篇关于xgboost多类工作中的base_score有什么用?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆