在partykit包ctree()中修改终端节点 [英] Modifying terminal node in ctree(), partykit package

查看:233
本文介绍了在partykit包ctree()中修改终端节点的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个因变量,可以根据决策树进行分类.它由三类频率组成:738(19%),426(15%)和1800(66%).正如您所想象的那样,预测类别始终是第三类别,但是树的用途是描述性的,因此实际上并不重要. 事实是,当通过ctree()功能(程序包partykit)绘制树时,终端节点将显示直方图,该直方图显示出这三种类别的出现概率.我需要修改此输出:我想获得终端节点中每个类相对于类的绝对频率出现的比例. 例如,在class1的738名参与者中,哪个百分比属于某个终端节点?每个终端节点将针对组成因变量的所有三个类显示此值.

I have a dependent variable to classify by a decision tree. It's composed by three categories of frequences: 738 (19%), 426 (15%) and 1800 (66%). As you imagine the predicted category is always the third one, but the purpose of the tree is descriptive so it does not actually matter. The thing is, when plotting a tree by the ctree() function (package partykit) the terminal nodes display histograms showing the probability of occurrence of the three classes. I need to modify this output: I would like to obtain the proportions of occurrence of each class within the terminal node with respect to the class' absolute frequency. For example, which percentage of the 738 participants in class1 belongs to a certain terminal node? Each terminal node would display this values for all the three classes that compose the dependent variable.

在这棵树的下面,默认情况下报告终端节点中每个类的流行情况.

Bellow a plot of the tree, which by default reports the prevalence of each class within the terminal nodes.

推荐答案

您始终可以定义自己的面板函数来绘制进入每个终端面板窗口的内容.如果您对grid图形有所了解,并了解当前终端面板功能的定义方式,那么您将了解其工作原理.

You can always define your own panel function to draw what goes into each terminal panel window. If you know a little bit about grid graphics and you look at how the current terminal panel functions are defined you will see how this works.

partykit包中的node_terminal()是应该执行所需功能的一个面板功能(对旧版party包的重新实现有了很大的改进).但是,由于ctree()不会在每个终端节点中存储其预测,因此node_terminal()函数目前无法立即执行此操作.我将尝试在将来的版本中改进实现,以便于实现.我希望下面是一个可以完成您想要的事的示例.

One panel function that ought to do what you want is node_terminal() in the partykit package (the much improved re-implementation of the old party package). However, because ctree() does not store its predictions in each terminal node, the node_terminal() function cannot do this out of the box at the moment. I'll try to improve the implementation in future versions so that this can be facilitated. Below is a somewhat involved example that should do what you want, I hope.

首先,我们使用iris数据拟合分类树(作为一个简单的可重现示例):

First, we fit a classification tree using the iris data (for a simple reproducible example):

library("partykit")
(ct <- ctree(Species ~ ., data = iris))
## Model formula:
## Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width
## 
## Fitted party:
## [1] root
## |   [2] Petal.Length <= 1.9: setosa (n = 50, err = 0.0%)
## |   [3] Petal.Length > 1.9
## |   |   [4] Petal.Width <= 1.7
## |   |   |   [5] Petal.Length <= 4.8: versicolor (n = 46, err = 2.2%)
## |   |   |   [6] Petal.Length > 4.8: versicolor (n = 8, err = 50.0%)
## |   |   [7] Petal.Width > 1.7: virginica (n = 46, err = 2.2%)
## 
## Number of inner nodes:    3
## Number of terminal nodes: 4

然后,我们为每个终端节点计算一个预测概率表:

Then we compute a table of predicted probabilities for each terminal node:

(pred <- aggregate(predict(ct, type = "prob"),
  list(predict(ct, type = "node")), FUN = mean))
##   Group.1 setosa versicolor  virginica
## 1       2      1 0.00000000 0.00000000
## 2       5      0 0.97826087 0.02173913
## 3       6      0 0.50000000 0.50000000
## 4       7      0 0.02173913 0.97826087

然后是不太明显的部分:我们希望将这些预测的概率包括在树本身的终端节点中.为此,我们将递归节点结构强制为一个平面列表,插入预测(适当设置格式),然后将列表转换回节点结构:

Then comes the not so obvious part: We want to include these predicted probabilities in the terminal nodes of the tree itself. For this, we coerce the recursive node structure to a flat list, insert the predictions (suitably formatted), and convert the list back to the node structure:

ct_node <- as.list(ct$node)
for(i in 1:nrow(pred)) {
  ct_node[[pred[i,1]]]$info$prediction <- paste(
    format(names(pred)[-1]),
    format(round(pred[i, -1], digits = 3), nsmall = 3)
  )
}
ct$node <- as.partynode(ct_node)

然后,我们可以使用node_terminal面板功能轻松绘制树的图片,并插入预先格式化的预测:

Then, we can easily draw a picture of the tree with the node_terminal panel function and inserting our pre-formatted predictions:

plot(ct, terminal_panel = node_terminal, tp_args = list(
  FUN = function(node) c("Predictions", node$prediction)))

listparty之间的来回强制实际上已经在软件包中实现了...我只是忘了它;-)如果您这样做

The coercing back and forth between a list and a party is actually already implemented in the package...I just forgot about it ;-) If you do

st <- as.simpleparty(ct)

然后,所得的party在每个节点中具有有关预测等的更多详细信息.例如,$distribution然后包含每个响应级别的绝对频率.可以像以前一样轻松地格式化

then the resulting party has in each node more detailed information about the predictions etc. For example, the $distribution then contains the absolute frequencies for each response level. This can easily be formatted as before

pred <- function(i) {
  tab <- i$distribution
  tab <- round(prop.table(tab), 3)
  tab <- paste0(names(tab), ":", format(tab, nsmall = 3))
  c("Predictions", tab)
}

这可以传递给node_terminal以本质上创建上面的图.如果希望所有终端节点都显示在底行中,则可能需要将drop = FALSE更改为drop = TRUE.

And this can be passed to node_terminal to essentially create the plot above. You might want to change drop = FALSE to drop = TRUE if you want all terminal nodes to be displayed in the bottom row.

plot(st, terminal_panel = node_terminal, tp_args = list(FUN = pred))

这篇关于在partykit包ctree()中修改终端节点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆