python中如何提取随机森林的决策规则 [英] how extraction decision rules of random forest in python

查看:80
本文介绍了python中如何提取随机森林的决策规则的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

不过我有一个问题.我从某人那里听说,在 R 中,您可以使用额外的包来提取在 RF 中实现的决策规则,我尝试在 python 中搜索相同的东西,但没有运气,如果对如何实现有任何帮助.提前致谢!

I have one question though. I heard from someone that in R, you can use extra packages to extract the decision rules implemented in RF, I try to google the same thing in python but without luck, if there is any help on how to achieve that. thanks in advance!

推荐答案

假设您使用 sklearn RandomForestClassifier,您可以找到作为 .estimators_ 的单个决策树.每棵树将决策节点存储为 tree_ 下的多个 NumPy 数组.

Assuming that you use sklearn RandomForestClassifier you can find the invididual decision trees as .estimators_. Each tree stores the decision nodes as a number of NumPy arrays under tree_.

这是一些示例代码,它只是按照数组的顺序打印每个节点.在典型的应用程序中,人们会通过跟随孩子来遍历.

Here is some example code which just prints each node in order of the array. In a typical application one would instead traverse by following the children.

import numpy
from sklearn.model_selection import train_test_split
from sklearn import metrics, datasets, ensemble

def print_decision_rules(rf):

    for tree_idx, est in enumerate(rf.estimators_):
        tree = est.tree_
        assert tree.value.shape[1] == 1 # no support for multi-output

        print('TREE: {}'.format(tree_idx))

        iterator = enumerate(zip(tree.children_left, tree.children_right, tree.feature, tree.threshold, tree.value))
        for node_idx, data in iterator:
            left, right, feature, th, value = data

            # left: index of left child (if any)
            # right: index of right child (if any)
            # feature: index of the feature to check
            # th: the threshold to compare against
            # value: values associated with classes            

            # for classifier, value is 0 except the index of the class to return
            class_idx = numpy.argmax(value[0])

            if left == -1 and right == -1:
                print('{} LEAF: return class={}'.format(node_idx, class_idx))
            else:
                print('{} NODE: if feature[{}] < {} then next={} else next={}'.format(node_idx, feature, th, left, right))    


digits = datasets.load_digits()
Xtrain, Xtest, ytrain, ytest = train_test_split(digits.data, digits.target)
estimator = ensemble.RandomForestClassifier(n_estimators=3, max_depth=2)
estimator.fit(Xtrain, ytrain)

print_decision_rules(estimator)

输出示例:

TREE: 0
0 NODE: if feature[33] < 2.5 then next=1 else next=4
1 NODE: if feature[38] < 0.5 then next=2 else next=3
2 LEAF: return class=2
3 LEAF: return class=9
4 NODE: if feature[50] < 8.5 then next=5 else next=6
5 LEAF: return class=4
6 LEAF: return class=0
...

我们在 emtrees 中使用类似的东西将随机森林编译为 C 代码.

We use something similar in emtrees to compile a Random Forest to C code.

这篇关于python中如何提取随机森林的决策规则的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆