scikit-learn DecisionTreeClassifier.tree_.value 有什么作用? [英] What does scikit-learn DecisionTreeClassifier.tree_.value do?

查看:34
本文介绍了scikit-learn DecisionTreeClassifier.tree_.value 有什么作用?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在研究一个 DecisionTreeClassifier 模型,我想了解该模型选择的路径.所以我需要知道什么值赋予了

DecisionTreeClassifier.tree_.value

解决方案

嗯,你说得对,文档实际上对此并不了解(但说实话,我也不确定它的用处).

>

让我们用虹膜数据复制

如我们所见,这棵树有 17 个节点;仔细观察,我们看到每个节点的 value 实际上是我们的 clf.tree_.value 数组的一个元素.

所以,长话短说:

  • clf.tree_.value 是一个数组数组,长度等于树中的节点数
  • 它的每个元素数组(对应于一个树节点)的长度等于类的数量(这里是 3 个)
  • 这些 3 元素数组中的每一个都对应于最终在每个类的相应节点中的训练样本数量.

通过一个例子来阐明最后一点,考虑数组的第二个元素,[[ 50., 0., 0.]](对应于橙色节点):它说,在这个节点中,最终有来自类 #0 的 50 个样本,以及来自其他两个类(#1 和 #2)的零个样本.

希望这有帮助...

I am working on a DecisionTreeClassifier model and I want to understand the path chosen by the model. So I need to know what values give the

DecisionTreeClassifier.tree_.value

解决方案

Well, you are correct in that the documentation is actually obscure about this (but to be honest, I am not sure about its usefulness, too).

Let's replicate the example from the documentation with the iris data:

from sklearn.datasets import load_iris
from sklearn import tree
iris = load_iris()
clf = tree.DecisionTreeClassifier()
clf = clf.fit(iris.data, iris.target)

Asking for clf.tree_.value, we get:

array([[[ 50.,  50.,  50.]],
       [[ 50.,   0.,   0.]],
       [[  0.,  50.,  50.]],
       [[  0.,  49.,   5.]],
       [[  0.,  47.,   1.]],
       [[  0.,  47.,   0.]],
       [[  0.,   0.,   1.]],
       [[  0.,   2.,   4.]],
       [[  0.,   0.,   3.]],
       [[  0.,   2.,   1.]],
       [[  0.,   2.,   0.]],
       [[  0.,   0.,   1.]],
       [[  0.,   1.,  45.]],
       [[  0.,   1.,   2.]],
       [[  0.,   1.,   0.]],
       [[  0.,   0.,   2.]],
       [[  0.,   0.,  43.]]])

and

len(clf.tree_.value)
# 17

To realize what exactly this array represents it is useful to look at the tree visualization (also available in the docs, reproduced here for convenience):

As we can see, the tree has 17 nodes; looking closer, we see that the value of each node is actually an element of our clf.tree_.value array.

So, to make a long story short:

  • clf.tree_.value is an array of arrays, of length equal to the number of nodes in the tree
  • Each of its element arrays (which corresponds to a tree node) is of length equal to the number of classes (here 3)
  • Each of these 3-element arrays corresponds to the amount of training samples that end up in the respective node for each class.

To clarify on the last point with an example, consider the second element of the array, [[ 50., 0., 0.]] (which corresponds to the orange-colored node): it says that, in this node, end up 50 samples from the class #0, and zero samples from the other two classes (#1 and #2).

Hope this helps...

这篇关于scikit-learn DecisionTreeClassifier.tree_.value 有什么作用?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆