如何从python3中的xgboost模型中提取决策规则(特征拆分)? [英] How to extract decision rules (features splits) from xgboost model in python3?

查看:63
本文介绍了如何从python3中的xgboost模型中提取决策规则(特征拆分)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要从我在 python 中拟合的 xgboost 模型中提取决策规则.我使用 0.6a2 版本的 xgboost 库,我的 python 版本是 3.5.2.

我的最终目标是使用这些拆分来装箱变量(根据拆分).

我没有发现这个版本的模型的任何属性会导致我分裂.

plot_tree 给了我类似的东西.然而,它是树的可视化.

对于 xgboost 模型,我需要类似 https://stackoverflow.com/a/39772170/4559070 之类的东西

解决方案

这是可能的,但并不容易.我建议您使用 scikit-learn 中的 GradientBoostingClassifier,它类似于 xgboost,但具有对构建树的本机访问权限.>

但是,使用 xgboost,可以获得模型的文本表示,然后对其进行解析:

from sklearn.datasets import load_iris从 xgboost 导入 XGBClassifier# 构建一个非常简单的模型X, y = load_iris(return_X_y=True)模型 = XGBClassifier(max_depth=2, n_estimators=2)模型拟合(X,Y);# 转储到文本文件model.get_booster().dump_model('xgb_model.txt', with_stats=True)#读取文件内容with open('xgb_model.txt', 'r') as f:txt_model = f.read()打印(txt_model)

它将打印出 6 棵树的文本描述(2 个估计器,每个估计器由 3 棵树组成,每个类一个),其开头如下:

助推器[0]:0:[f2<2.45] yes=1,no=2,missing=1,gain=72.2968,cover=66.66671:叶子=0.143541,封面=22.22222:叶=-0.0733496,封面=44.4444助推器[1]:0:[f2<2.45]是=1,否=2,缺失=1,增益=18.0742,覆盖=66.66671:叶=-0.0717703,封面=22.22222:[f3<1.75]是=3,否=4,缺失=3,增益=41.9078,覆盖=44.44443:叶子=0.124,封面=244:叶=-0.0668394,封面=20.4444...

例如,现在您可以从此描述中提取所有拆分:

导入重新# 尝试提取所有模式,如[f2<2.45]"splits = re.findall('\[f([0-9]+)<([0-9]+.[0-9]+)\]', txt_model)分裂

它会打印你的元组列表(feature_id,split_value),比如

[('2', '2.45'),('2', '2.45'),('3', '1.75'),('3', '1.65'),('2', '4.95'),('2', '2.45'),('2', '2.45'),('3', '1.75'),('3', '1.65'),('2', '4.95')]

您可以根据需要进一步处理此列表.

I need to extract the decision rules from my fitted xgboost model in python. I use 0.6a2 version of xgboost library and my python version is 3.5.2.

My ultimate goal is to use those splits to bin variables ( according to the splits).

I did not come across any property of the model for this version which can give me splits.

plot_tree is giving me something similar. However it is visualization of the tree.

I need something like https://stackoverflow.com/a/39772170/4559070 for xgboost model

解决方案

It is possible, but not easy. I would recommend you to use GradientBoostingClassifier from scikit-learn, which is similar to xgboost, but has native access to the built trees.

With xgboost, however, it is possible to get a textual representation of the model and then parse it:

from sklearn.datasets import load_iris
from xgboost import XGBClassifier
# build a very simple model
X, y = load_iris(return_X_y=True)
model = XGBClassifier(max_depth=2, n_estimators=2)
model.fit(X, y);
# dump it to a text file
model.get_booster().dump_model('xgb_model.txt', with_stats=True)
# read the contents of the file
with open('xgb_model.txt', 'r') as f:
    txt_model = f.read()
print(txt_model)

It will print you a textual description of 6 trees (2 estimators, each consists of 3 trees, one per class), which starts like this:

booster[0]:
0:[f2<2.45] yes=1,no=2,missing=1,gain=72.2968,cover=66.6667
    1:leaf=0.143541,cover=22.2222
    2:leaf=-0.0733496,cover=44.4444
booster[1]:
0:[f2<2.45] yes=1,no=2,missing=1,gain=18.0742,cover=66.6667
    1:leaf=-0.0717703,cover=22.2222
    2:[f3<1.75] yes=3,no=4,missing=3,gain=41.9078,cover=44.4444
        3:leaf=0.124,cover=24
        4:leaf=-0.0668394,cover=20.4444
...

Now you can, for example, extract all splits from this description:

import re
# trying to extract all patterns like "[f2<2.45]"
splits = re.findall('\[f([0-9]+)<([0-9]+.[0-9]+)\]', txt_model)
splits

It will print you the list of tuples (feature_id, split_value), like

[('2', '2.45'),
 ('2', '2.45'),
 ('3', '1.75'),
 ('3', '1.65'),
 ('2', '4.95'),
 ('2', '2.45'),
 ('2', '2.45'),
 ('3', '1.75'),
 ('3', '1.65'),
 ('2', '4.95')]

You can further process this list as you wish.

这篇关于如何从python3中的xgboost模型中提取决策规则(特征拆分)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆