LightGBM的多类别分类 [英] Multiclass Classification with LightGBM
问题描述
我正在尝试使用Python中的LightGBM为多类分类问题(3个类)建模分类器.我使用了以下参数.
I am trying to model a classifier for a multi-class Classification problem (3 Classes) using LightGBM in Python. I used the following parameters.
params = {'task': 'train',
'boosting_type': 'gbdt',
'objective': 'multiclass',
'num_class':3,
'metric': 'multi_logloss',
'learning_rate': 0.002296,
'max_depth': 7,
'num_leaves': 17,
'feature_fraction': 0.4,
'bagging_fraction': 0.6,
'bagging_freq': 17}
数据集的所有分类特征均使用LabelEncoder
进行标签编码.如下所示,我在用eartly_stopping
运行cv
后训练了模型.
All the categorical features of the dataset is label encoded with LabelEncoder
. I trained the model after running cv
with eartly_stopping
as shown below.
lgb_cv = lgbm.cv(params, d_train, num_boost_round=10000, nfold=3, shuffle=True, stratified=True, verbose_eval=20, early_stopping_rounds=100)
nround = lgb_cv['multi_logloss-mean'].index(np.min(lgb_cv['multi_logloss-mean']))
print(nround)
model = lgbm.train(params, d_train, num_boost_round=nround)
训练后,我用这样的模型进行预测
After training, I made prediction with model like this,
preds = model.predict(test)
print(preds)
我得到了这样的嵌套数组作为输出.
I got a nested array as output like this.
[[ 7.93856847e-06 9.99989550e-01 2.51164967e-06]
[ 7.26332978e-01 1.65316511e-05 2.73650491e-01]
[ 7.28564308e-01 8.36756769e-06 2.71427325e-01]
...,
[ 7.26892634e-01 1.26915179e-05 2.73094674e-01]
[ 5.93217601e-01 2.07172044e-04 4.06575227e-01]
[ 5.91722491e-05 9.99883828e-01 5.69994435e-05]]
由于preds
中的每个列表都代表了类概率,因此我使用np.argmax()
来找到这样的类.
As each list in the preds
represent the class probabilites I used np.argmax()
to find the classes like this..
predictions = []
for x in preds:
predictions.append(np.argmax(x))
在分析预测时,我发现我的预测仅包含2类-0和1.2类是训练集中的第二大类,但是在预测中却找不到.给出了78%
的准确性.
While analyzing the prediction I found that my predictions contain only 2 classes - 0 and 1. Class 2 was the 2nd largest class in the training set, but it was nowhere to be found in the predictions.. On evaluating the result it gave about 78%
accuracy.
那么,为什么我的模型没有针对任何情况预测2类?我使用的参数有什么问题吗?
So, why didn't my model predict class 2 for any of the cases.? Is there anything wrong in the parameters I used.?
这不是解释模型做出的预测的正确方法吗?我应该对参数进行任何更改吗??
Isn't this the proper way to make interpret prediction made by the model.? Should I make any changes for the parameters.??
推荐答案
尝试通过交换类0和2,并重新运行训练和预测过程来进行故障排除.
Try troubleshooting by swapping classes 0 and 2, and re-running the trainining and prediction process.
如果新的预测仅包含1类和2类(很可能是根据您提供的数据得出的):
If the new predictions only contain classes 1 and 2 (most likely given your provided data):
- 分类器可能没有学过第三堂课;也许其功能与较大类的功能重叠,并且分类器默认使用较大类,以最大程度地减少目标函数.尝试提供平衡的训练集(每个班级有相同数量的样本),然后重试.
如果新的预测确实包含所有3个类别:
If the new predictions do contain all 3 classes:
- 您的代码中某处出了点问题.需要更多信息来确定到底出了什么问题.
希望这会有所帮助.
这篇关于LightGBM的多类别分类的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!