xgboost predict_proba:如何在概率和标签之间进行映射 [英] xgboost predict_proba : How to do the mapping between the probabilities and the labels
问题描述
我正在尝试使用xgboost算法预测解决多类分类,但是我不知道 predict_proba
是如何工作的.实际上, predict_proba
会生成一个概率列表,但我不知道每种概率与哪个类别相关.
I'm trying to predict solve a multiclass classification using the xgboost algorithm, however i do not know how does predict_proba
works exactly. In fact, predict_proba
generates a list of probabilities but i don't know to which class each probability is related.
这是一个简单的例子:
这是我的火车数据:
+------------+----------+-------+
| feature1 | feature2 | label |
+------------+----------+-------+
| x | z | 3 |
+------------+----------+-------+
| y | u | 0 |
+------------+----------+-------+
| x | u | 2 |
+------------+----------+-------+
然后当我尝试预测probas的新例子
Then when I try to predict probas for a new example
model.predict_proba(['x','u'])
这将返回如下内容:
[0.2, 0.3, 0.5]
我的问题是:概率为0.5的类别是什么?是2类,3类还是0类?
My question is : what is the class that has the probability of 0.5 ? is it the class 2, or 3 or 0 ?
推荐答案
似乎您在使用xgboost的sklearn API.在这种情况下,模型具有专用的属性 model.classes _
,该属性返回模型学习到的类,并且输出数组中类的顺序与概率的顺序相对应.
It seems that you use the sklearn API of xgboost. In this case the model has a dedicated attribute model.classes_
that returns the classes that were learned by the model and the order of classes in the output array corresponds to the order of probabilities.
这是一个有关伪数据的示例:
Here is an example with dummy data:
import numpy as np
import pandas as pd
import xgboost as xgb
# generate dummy data (10k examples, 10 numeric features, 4 classes of target)
np.random.seed(312)
train_X = np.random.random((10000,10))
train_y_mcc = np.random.randint(0, 4, train_X.shape[0]) #four classes:0,1,2,3
# model
xgb_model_mpg = xgb.XGBClassifier(max_depth= 3, n_estimators=100)
xgb_model_mpg.fit(train_X, train_y_mcc)
# classes
print(xgb_model_mpg.classes_)
>>> [0 1 2 3]
这篇关于xgboost predict_proba:如何在概率和标签之间进行映射的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!