使用 scikit-learn 时,如何找到我的树分裂的属性? [英] How do I find which attributes my tree splits on, when using scikit-learn?
问题描述
我一直在探索 scikit-learn,使用熵和基尼分裂标准制作决策树,并探索差异.
我的问题是,我怎样才能打开引擎盖"并确切地找出树在每个级别上分裂的属性以及它们的相关信息值,以便我可以看到两个标准在哪里做出不同的选择?
到目前为止,我已经探索了文档中概述的 9 种方法.他们似乎不允许访问此信息.但确定这些信息是可访问的吗?我正在设想一个包含节点和增益条目的列表或字典.
感谢您的帮助,如果我遗漏了一些非常明显的内容,我深表歉意.
直接来自文档 ( http://scikit-learn.org/0.12/modules/tree.html):
from io import StringIO输出 = StringIO()out = tree.export_graphviz(clf, out_file=out)
<块引用>
StringIO
模块在 Python3 中不再支持,而是导入 io
模块.
决策树对象中还有 tree_
属性,它允许直接访问整个结构.
你可以简单地阅读它
clf.tree_.children_left #左孩子数组clf.tree_.children_right #右孩子数组clf.tree_.feature #节点分割特征数组clf.tree_.threshold #节点分割点数组clf.tree_.value #节点值数组
更多细节查看源代码导出方式
一般你可以使用inspect
模块
from inspect import getmembers打印(获取成员(clf.tree_))
获取对象的所有元素
I have been exploring scikit-learn, making decision trees with both entropy and gini splitting criteria, and exploring the differences.
My question, is how can I "open the hood" and find out exactly which attributes the trees are splitting on at each level, along with their associated information values, so I can see where the two criterion make different choices?
So far, I have explored the 9 methods outlined in the documentation. They don't appear to allow access to this information. But surely this information is accessible? I'm envisioning a list or dict that has entries for node and gain.
Thanks for your help and my apologies if I've missed something completely obvious.
Directly from the documentation ( http://scikit-learn.org/0.12/modules/tree.html ):
from io import StringIO
out = StringIO()
out = tree.export_graphviz(clf, out_file=out)
StringIO
module is no longer supported in Python3, instead importio
module.
There is also the tree_
attribute in your decision tree object, which allows the direct access to the whole structure.
And you can simply read it
clf.tree_.children_left #array of left children
clf.tree_.children_right #array of right children
clf.tree_.feature #array of nodes splitting feature
clf.tree_.threshold #array of nodes splitting points
clf.tree_.value #array of nodes values
for more details look at the source code of export method
In general you can use the inspect
module
from inspect import getmembers
print( getmembers( clf.tree_ ) )
to get all the object's elements
这篇关于使用 scikit-learn 时,如何找到我的树分裂的属性?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!