scikit学习DecisionTreeClassifier.tree_.value有什么作用? [英] What does scikit-learn DecisionTreeClassifier.tree_.value do?

查看:472
本文介绍了scikit学习DecisionTreeClassifier.tree_.value有什么作用?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在研究DecisionTreeClassifier模型,我想了解模型选择的路径.所以我需要知道

是什么值

 DecisionTreeClassifier.tree_.value
 

谢谢

解决方案

您是正确的,因为该文档实际上对此并不了解(但是,老实说,我也不确定它的用处).

让我们从文档中的示例复制虹膜数据:

 from sklearn.datasets import load_iris
from sklearn import tree
iris = load_iris()
clf = tree.DecisionTreeClassifier()
clf = clf.fit(iris.data, iris.target)
 

要求clf.tree_.value,我们得到:

 array([[[ 50.,  50.,  50.]],
       [[ 50.,   0.,   0.]],
       [[  0.,  50.,  50.]],
       [[  0.,  49.,   5.]],
       [[  0.,  47.,   1.]],
       [[  0.,  47.,   0.]],
       [[  0.,   0.,   1.]],
       [[  0.,   2.,   4.]],
       [[  0.,   0.,   3.]],
       [[  0.,   2.,   1.]],
       [[  0.,   2.,   0.]],
       [[  0.,   0.,   1.]],
       [[  0.,   1.,  45.]],
       [[  0.,   1.,   2.]],
       [[  0.,   1.,   0.]],
       [[  0.,   0.,   2.]],
       [[  0.,   0.,  43.]]])
 

 len(clf.tree_.value)
# 17
 

要了解此数组确切表示的内容,请查看树的可视化效果(也可在文档中找到,为方便起见,在此处复制):

我们可以看到,树有17个节点;仔细观察,我们发现每个节点的value实际上是我们的clf.tree_.value数组的元素.

因此,总而言之:

  • clf.tree_.value是一个数组数组,长度等于树中的节点数
  • 其每个元素数组(对应于一个树节点)的长度等于类数(此处为3)
  • 这些3元素数组中的每一个对应于最终在每个类别的相应节点中得到的训练样本的数量.

为了通过示例澄清最后一点,请考虑数组的第二个元素[[ 50., 0., 0.]](它对应于橙色节点):它表示在该节点中,从该类中获得了50个样本#0,另外两个类别(#1和#2)中的零个样本.

希望这对您有帮助...

I am working on a DecisionTreeClassifier model and I want to understand the path chosen by the model. So I need to know what values give the

DecisionTreeClassifier.tree_.value

Thank you,

解决方案

Well, you are correct in that the documentation is actually obscure about this (but to be honest, I am not sure about its usefulness, too).

Let's replicate the example from the documentation with the iris data:

from sklearn.datasets import load_iris
from sklearn import tree
iris = load_iris()
clf = tree.DecisionTreeClassifier()
clf = clf.fit(iris.data, iris.target)

Asking for clf.tree_.value, we get:

array([[[ 50.,  50.,  50.]],
       [[ 50.,   0.,   0.]],
       [[  0.,  50.,  50.]],
       [[  0.,  49.,   5.]],
       [[  0.,  47.,   1.]],
       [[  0.,  47.,   0.]],
       [[  0.,   0.,   1.]],
       [[  0.,   2.,   4.]],
       [[  0.,   0.,   3.]],
       [[  0.,   2.,   1.]],
       [[  0.,   2.,   0.]],
       [[  0.,   0.,   1.]],
       [[  0.,   1.,  45.]],
       [[  0.,   1.,   2.]],
       [[  0.,   1.,   0.]],
       [[  0.,   0.,   2.]],
       [[  0.,   0.,  43.]]])

and

len(clf.tree_.value)
# 17

To realize what exactly this array represents it is useful to look at the tree visualization (also available in the docs, reproduced here for convenience):

As we can see, the tree has 17 nodes; looking closer, we see that the value of each node is actually an element of our clf.tree_.value array.

So, to make a long story short:

  • clf.tree_.value is an array of arrays, of length equal to the number of nodes in the tree
  • Each of its element arrays (which corresponds to a tree node) is of length equal to the number of classes (here 3)
  • Each of these 3-element arrays corresponds to the amount of training samples that end up in the respective node for each class.

To clarify on the last point with an example, consider the second element of the array, [[ 50., 0., 0.]] (which corresponds to the orange-colored node): it says that, in this node, end up 50 samples from the class #0, and zero samples from the other two classes (#1 and #2).

Hope this helps...

这篇关于scikit学习DecisionTreeClassifier.tree_.value有什么作用?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆