如何从scikit-learn决策树中提取决策规则? [英] How to extract the decision rules from scikit-learn decision-tree?

查看:1020
本文介绍了如何从scikit-learn决策树中提取决策规则?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我可以从决策树中经过训练的树中提取出基本的决策规则(或决策路径")作为文本列表吗?

Can I extract the underlying decision-rules (or 'decision paths') from a trained tree in a decision tree as a textual list?

类似:

if A>0.4 then if B<0.2 then if C>0.8 then class='X'

感谢您的帮助.

推荐答案

我相信这个答案比这里的其他答案更正确:

I believe that this answer is more correct than the other answers here:

from sklearn.tree import _tree

def tree_to_code(tree, feature_names):
    tree_ = tree.tree_
    feature_name = [
        feature_names[i] if i != _tree.TREE_UNDEFINED else "undefined!"
        for i in tree_.feature
    ]
    print "def tree({}):".format(", ".join(feature_names))

    def recurse(node, depth):
        indent = "  " * depth
        if tree_.feature[node] != _tree.TREE_UNDEFINED:
            name = feature_name[node]
            threshold = tree_.threshold[node]
            print "{}if {} <= {}:".format(indent, name, threshold)
            recurse(tree_.children_left[node], depth + 1)
            print "{}else:  # if {} > {}".format(indent, name, threshold)
            recurse(tree_.children_right[node], depth + 1)
        else:
            print "{}return {}".format(indent, tree_.value[node])

    recurse(0, 1)

这将打印出有效的Python函数.这是一个试图返回其输入的树的示例输出,该数字介于0和10之间.

This prints out a valid Python function. Here's an example output for a tree that is trying to return its input, a number between 0 and 10.

def tree(f0):
  if f0 <= 6.0:
    if f0 <= 1.5:
      return [[ 0.]]
    else:  # if f0 > 1.5
      if f0 <= 4.5:
        if f0 <= 3.5:
          return [[ 3.]]
        else:  # if f0 > 3.5
          return [[ 4.]]
      else:  # if f0 > 4.5
        return [[ 5.]]
  else:  # if f0 > 6.0
    if f0 <= 8.5:
      if f0 <= 7.5:
        return [[ 7.]]
      else:  # if f0 > 7.5
        return [[ 8.]]
    else:  # if f0 > 8.5
      return [[ 9.]]

以下是我在其他答案中看到的一些绊脚石:

Here are some stumbling blocks that I see in other answers:

  1. 使用tree_.threshold == -2来确定节点是否为叶子不是一个好主意.如果它是阈值为-2的真实决策节点,该怎么办?相反,您应该查看tree.featuretree.children_*.
  2. features = [feature_names[i] for i in tree_.feature]行在我的sklearn版本中崩溃,因为tree.tree_.feature的某些值是-2(特别是对于叶节点).
  3. 在递归函数中不需要有多个if语句,只需一个就可以了.
  1. Using tree_.threshold == -2 to decide whether a node is a leaf isn't a good idea. What if it's a real decision node with a threshold of -2? Instead, you should look at tree.feature or tree.children_*.
  2. The line features = [feature_names[i] for i in tree_.feature] crashes with my version of sklearn, because some values of tree.tree_.feature are -2 (specifically for leaf nodes).
  3. There is no need to have multiple if statements in the recursive function, just one is fine.

这篇关于如何从scikit-learn决策树中提取决策规则?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆