解释 Graphviz 输出以进行决策树回归 [英] interpreting Graphviz output for decision tree regression

查看:61
本文介绍了解释 Graphviz 输出以进行决策树回归的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我很好奇 Graphviz 在用于回归时生成的决策树节点中的 value 字段是什么.我知道这是在使用决策树分类时每个类中被分割分开的样本数,但我不确定这对回归意味着什么.

I'm curious what the value field is in the nodes of the decision tree produced by Graphviz when used for regression. I understand that this is the number of samples in each class that are separated by a split when using decision tree classification but I'm not sure what it means for regression.

我的数据有一个 2 维输入和一个 10 维输出.这是我的回归问题的树的示例:

My data has a 2 dimensional input and a 10 dimensional output. Here is an example of what a tree looks like for my regression problem:

使用此代码制作 &使用 webgraphviz 可视化

produced using this code & visualized with webgraphviz

# X = (n x 2)  Y = (n x 10)  X_test = (m x 2)

input_scaler = pickle.load(open("../input_scaler.sav","rb"))
reg = DecisionTreeRegressor(criterion = 'mse', max_depth = 2)
reg.fit(X,Y)
pred = reg.predict(X_test)
with open("classifier.txt", "w") as f:
    f = tree.export_graphviz(reg, out_file=f)

推荐答案

回归树实际作为输出返回的是结束训练样本的因变量(此处为 Y)的平均值up 在各自的终端节点(叶子);这些平均值显示为图片中名为 value 的列表,这里的长度均为 10,因为您的 Y 是 10 维的.

What a regression tree actually returns as output is the mean value of the dependent variable (here Y) of the training samples that end up in the respective terminal nodes (leaves); these mean values are shown as lists named value in the picture, which are all of length 10 here, since your Y is 10-dimensional.

换句话说,以树的最左边的终端节点(叶子)为例:

In other words, and using the leftmost terminal node (leaf) of your tree as an example:

  • 叶子由 42 个样本组成,其中 X[0] <= 0.675X[1] <= 0.5
  • 这42个样本的10维输出的平均值在这个假期的value列表中给出,它的长度确实是10,即Y[0的平均值]-152007.382Y[1]的平均值为-206040.675等,Y[9]3211.487.
  • The leaf consists of the 42 samples for which X[0] <= 0.675 and X[1] <= 0.5
  • The mean value of your 10-dimensional output for these 42 samples is given in the value list of this leave, which is of length 10 indeed, i.e. the mean of Y[0] is -152007.382, the mean of Y[1] is -206040.675 etc and the mean of Y[9] is 3211.487.

您可以通过预测一些样本(来自您的训练或测试集 - 无关紧要)并检查您的 10 维结果是否是 4 个 之一来确认情况确实如此上面终端中描述的列表.

You can confirm that this is the case by predicting some samples (from your training or test set - it doesn't matter) and checking that your 10-dimensional result is one of the 4 value lists depicted in the terminal leaves above.

另外,您可以确认,对于value中的每个元素,子节点的加权平均值等于父节点的相应元素.同样,使用最左边的 2 个终端节点(叶子)的第一个元素,我们得到:

Additionally, you can confirm that, for each element in value, the weighted averages of the children nodes are equal to the respective element of the parent node. Again, using the first element of your 2 leftmost terminal nodes (leaves), we get:

(-42*152007.382 - 56*199028.147)/98
# -178876.39057142858

即其父节点的 value[0] 元素(中间层中最左边的节点).再举一个例子,这次是你的 2 个中间节点的第一个 value 元素:

i.e. the value[0] element of their parent node (the leftmost node in the intermediate level). One more example, this time for the first value elements of your 2 intermediate nodes:

(-98*178876.391 + 42*417378.245)/140
# -0.00020000000617333822

再次与根节点的 -0.0 第一个 value 元素一致.

which again agrees with the -0.0 first value element of your root node.

从根节点的 value 列表来看,似乎你的 10 维 Y 的所有元素的平均值几乎为零,你可以(也应该)手动验证,如最后确认.

Judging from the value list of your root node, it seems that the mean values of all elements of your 10-dimensional Y are almost zero, which you can (and should) verify manually, as a final confirmation.

总结一下:

  • 每个节点的value列表包含属于"各个节点的训练样本的平均Y值
  • 此外,对于终端节点(叶子),这些列表是树模型的实际输出(即输出将始终是这些列表之一,取决于 X)
  • 对于根节点,value 列表包含整个训练数据集的平均 Y 值
  • The value list of each node contains the mean Y values for the training samples "belonging" to the respective node
  • Additionally, for the terminal nodes (leaves), these lists are the actual outputs of the tree model (i.e. the output will always be one of these lists, depending on X)
  • For the root node, the value list contains the mean Y values for the whole of your training dataset

这篇关于解释 Graphviz 输出以进行决策树回归的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆