Scikit决策树的分类功能 [英] Scikit Decision tree categorical features

查看:119
本文介绍了Scikit决策树的分类功能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



结果树紧随其后





我想知道是否可以用scikit构建这棵树-学习。我发现了几个决策树可以描述为

  export_graphviz(clf)
Source(export_graphviz(clf,out_file = none))

但是scikit不适用于分类数据,但是被二进制化为几列。因此,不可能像图片中那样完全构建树。

解决方案

是的,使用scikit-learn构建这样的树是正确的。 p>

主要原因是这是一棵三叉树(最多有三个孩子的节点),但是 scikit-learn仅实现二叉树 -节点完全有两个或没有子节点:

  cdef类树:
二进制的基于数组的表示形式
...

但是,有可能获得等效的二叉树形式

  Outlook == Sunny 
true =>湿度==高
true => ;否
false =>是
false => Outlook ==阴险
true =>是
false =>风==强
true = >否
fa lse =>是


There is well-know problem in Tom's Mitchell Machine Learning book to build decision tree based on the following data, where Play ball is the target variable.

The resulting tree is following

I wonder whether it's possible to build this tree with scikit-learn. I found several examples where decision tree can be depicted as

export_graphviz(clf) 
Source(export_graphviz(clf, out_file=None))

However it looks like scikit doesn't work well with categorical data, the data has to be binarized into several columns. So as result, it is impossible to build the tree exactly as in the picture. Is it correct?

解决方案

Yes, it is correct that it is impossible to build such a tree with scikit-learn.

The primary reason is that this is a ternary tree (nodes with up to three children) but scikit-learn implements only binary trees - nodes have exactly two or no children:

cdef class Tree:
    """Array-based representation of a binary decision tree.
...

However, it is possible to get an equivalent binary tree of the form

Outlook == Sunny
    true  => Humidity == High
        true  => no
        false => yes      
    false => Outlook == Overcast
        true  => yes
        false => Wind == Strong
            true  => no
            false => yes 

这篇关于Scikit决策树的分类功能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆