遍历sklearn决策树 [英] Traversal of sklearn decision tree

查看:161
本文介绍了遍历sklearn决策树的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何处理sklearn决策树的广度优先搜索遍历?

How do I do the breadth first search traversal of the sklearn decision tree?

在我的代码中,我尝试了sklearn.tree_库,并使用了诸如tree_.feature和tree_.threshold之类的各种功能来理解树的结构.但是,如果我想做bfs,这些功能会遍历树的dfs吗?

In my code i have tried sklearn.tree_ library and used various function such as tree_.feature and tree_.threshold to understand the structure of the tree. But these functions do the dfs traversal of the tree if I want to do bfs how should i do it?

假设

clf1 = DecisionTreeClassifier( max_depth = 2 )
clf1 = clf1.fit(x_train, y_train)

这是我的分类器,生成的决策树是

this is my classifier and the decision tree produced is

然后我使用以下函数遍历了树

Then I have traversed the tree using following function

def encoding(clf, features):
l1 = list()
l2 = list()

for i in range(len(clf.tree_.feature)):
    if(clf.tree_.feature[i]>=0):
        l1.append( features[clf.tree_.feature[i]])
        l2.append(clf.tree_.threshold[i])
    else:
        l1.append(None)
        print(np.max(clf.tree_.value))
        l2.append(np.argmax(clf.tree_.value[i]))

l = [l1 , l2]

return np.array(l)

产生的输出是

array([['address', 'age', None, None, 'age', None, None],
       [0.5, 17.5, 2, 1, 15.5, 1, 1]], dtype=object)

其中第一个数组是节点的特征,或者如果它点了叶子,则它被标记为无,第二个数组是特征节点的阈值,对于类节点,它是类,但这是dfs遍历树,我想进行bfs遍历我该怎么办?

where 1st array is feature of node or if it leaf noed then it is labelled as none and 2nd array is threshold for feature node and for class node it is class but this is dfs traversal of tree i want to do bfs traversal what should i do ?

由于我是堆栈溢出的新手,请提出如何改进问题描述以及应该添加哪些其他信息(如果有的话)以进一步说明我的问题.

As I am new to stack overflow kindly suggest how to improve the question description and what other information should i add if any to explain my problem further.

X_train(样本)

X_train (sample)

y_train(样本)

y_train (sample)

推荐答案

这应该做到:

from collections import deque

tree = clf.tree_

stack = deque()
stack.append(0)  # push tree root to stack

while stack:
    current_node = stack.popleft()

    # do whatever you want with current node
    # ...

    left_child = tree.children_left[current_node]
    if left_child >= 0:
        stack.append(left_child)

    right_child = tree.children_right[current_node]
    if right_child >= 0:
        stack.append(right_child)

这使用 deque 保留要处理的下一堆节点.由于我们从左侧删除了元素,然后在右侧添加了元素,因此这应该表示宽度优先的遍历.

This uses a deque to keep a stack of the nodes to process next. Since we remove elements from the left and add them to the right, this should represent a breadth-first traversal.

为实际使用,我建议您将其变成发电机:

For actual use, I suggest you turn this into a generator:

from collections import deque

def breadth_first_traversal(tree):
    stack = deque()
    stack.append(0)

    while stack:
        current_node = stack.popleft()

        yield current_node

        left_child = tree.children_left[current_node]
        if left_child >= 0:
            stack.append(left_child)

        right_child = tree.children_right[current_node]
        if right_child >= 0:
            stack.append(right_child)

然后,您只需要对原始功能进行最小的更改:

Then, you only need minimal changes to your original function:

def encoding(clf, features):
    l1 = list()
    l2 = list()

    for i in breadth_first_traversal(clf.tree_):
        if(clf.tree_.feature[i]>=0):
            l1.append( features[clf.tree_.feature[i]])
            l2.append(clf.tree_.threshold[i])
        else:
            l1.append(None)
            print(np.max(clf.tree_.value))
            l2.append(np.argmax(clf.tree_.value[i]))

    l = [l1 , l2]

    return np.array(l)

这篇关于遍历sklearn决策树的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆