R提取内部节点信息并从ctree中分割(partykit) [英] R Extracting inner node information and splits from ctree (partykit)

查看:192
本文介绍了R提取内部节点信息并从ctree中分割(partykit)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

您好我目前正在尝试使用partykit中的ctree提取存储在R中常量派对对象中的一些内部节点信息,但我发现导航对象有点困难,我可以显示信息一个情节,但我不知道如何提取信息 - 我认为它需要节点应用程序或党的其他功能?

  library(partykit)
irisct < - ctree(Species〜。,data = iris)
plot(irisct,inner_panel = node_barplot(irisct))

绘制内部节点详细信息



所有信息均可通过函数以绘制,但我后面的文本输出类似于:
示例输出
a>

主要技巧(正如@G5W先前指出的)是将然后提取数据(通过 $ data > > 或使用包含响应的 data_party()函数)。我建议先建立一个绝对频率表,然后从中计算相对频率和边际频率。使用 irisct 对象可以获得普通表

  tab< ;  -  sapply(1:length(irisct),function(id){
y< - data_party(irisct [id])
y< - y [[(response)]]
表(y)
})
标签
## [,1] [,2] [,3] [,4] [,5] [,6] [,7]
## setosa 50 50 0 0 0 0 0
## versicolor 50 0 50 49 45 4 1
## virginica 50 0 50 5 1 4 45


$ b

然后我们可以添加一点格式到一个漂亮的对象:

  colnames(tab)<  -  1:length(irisct)
tab< - as.table(tab)
名称(dimnames(tab))< -c(Species,Node)

然后使用 prop.table() margin.table()来计算我们感兴趣的频率in。 as.data.frame()方法从表格布局转换为long data.frame
$ $ p $ as.data.frame(prop.table(tab,1))
##物种节点Freq
## 1 setosa 1 0.500000000
## 2 versicolor 1 0.251256281
## 3 virginica 1 0.322580645
## 4 setosa 2 0.500000000
## 5 versicolor 2 0.000000000
## 6 virginica 2 0.000000000
## 7 setosa 3 0.000000000
## 8 versicolor 3 0.251256281
## 9 virginica 3 0.322580645
## 10 setosa 4 0.000000000
## 11 versicolor 4 0.246231156
## 12 virginica 4 0.032258065
## 13 setosa 5 0.000000000
## 14 versicolor 5 0.226130653
## 15 virginica 5 0.006451613
## 16 setosa 6 0.000000000
## 17 versicolor 6 0.020100503
## 18 virginica 6 0.025806452
## 19 setosa 7 0.000000000
## 20 versicolor 7 0.005025126
## 21 virginica 7 0.290322581
$ b $ asdata.frame(margin.table(tab,2))
## Node Freq
## 1 1 150
## 2 2 50
## 3 3 100
## 4 4 54
## 5 5 46
## 6 6 8
## 7 7 46

分割信息可以通过(尚未导出) .list.rules.party()函数。您只需要询问所有节点ID(默认仅使用终端节点ID):

  partykit :::。list.rules.party(irisct,i = nodeids(irisct))
## 1
##
## 2
# #Petal.Length< = 1.9
## 3
##Petal.Length> 1.9
## 4
##Petal.Length> 1.9& Petal.Width< = 1.7
## 5
##Petal.Length> 1.9& Petal.Width< = 1.7& Petal.Length< = 4.8
## 6
##Petal.Length> 1.9& Petal.Width <= 1.7& Petal.Length> 4.8
## 7
##Petal.Length> 1.9& Petal.Width> 1.7


Hi I'm currently trying to extract some of the inner node information stored in the constant partying object in R using ctree in partykit but I'm finding navigating the objects a bit difficult, I'm able to display the information on a plot but I'm not sure how to extract the information - I think it requires nodeapply or another function in the partykit?

library(partykit)
irisct <- ctree(Species ~ .,data = iris)
plot(irisct, inner_panel = node_barplot(irisct))

Plot with inner node details

All the information is accessible by the functions to plot, but I'm after a text output similar to: Example output

解决方案

The main trick (as previously pointed out by @G5W) is to take the [id] subset of the party object and then extract the data (by either $data or using the data_party() function) which contains the response. I would recommend to build a table with absolute frequencies first and then compute the relative and marginal frequencies from that. Using the irisct object the plain table can be obtained by

tab <- sapply(1:length(irisct), function(id) {
  y <- data_party(irisct[id])
  y <- y[["(response)"]]
  table(y)
})
tab
##            [,1] [,2] [,3] [,4] [,5] [,6] [,7]
## setosa       50   50    0    0    0    0    0
## versicolor   50    0   50   49   45    4    1
## virginica    50    0   50    5    1    4   45

Then we can add a little bit of formatting to a nice table object:

colnames(tab) <- 1:length(irisct)
tab <- as.table(tab)
names(dimnames(tab)) <- c("Species", "Node")

And then use prop.table() and margin.table() to compute the frequencies we are interested in. The as.data.frame() method transform from the table layout to a "long" data.frame:

as.data.frame(prop.table(tab, 1))
##       Species Node        Freq
## 1      setosa    1 0.500000000
## 2  versicolor    1 0.251256281
## 3   virginica    1 0.322580645
## 4      setosa    2 0.500000000
## 5  versicolor    2 0.000000000
## 6   virginica    2 0.000000000
## 7      setosa    3 0.000000000
## 8  versicolor    3 0.251256281
## 9   virginica    3 0.322580645
## 10     setosa    4 0.000000000
## 11 versicolor    4 0.246231156
## 12  virginica    4 0.032258065
## 13     setosa    5 0.000000000
## 14 versicolor    5 0.226130653
## 15  virginica    5 0.006451613
## 16     setosa    6 0.000000000
## 17 versicolor    6 0.020100503
## 18  virginica    6 0.025806452
## 19     setosa    7 0.000000000
## 20 versicolor    7 0.005025126
## 21  virginica    7 0.290322581

as.data.frame(margin.table(tab, 2))
##   Node Freq
## 1    1  150
## 2    2   50
## 3    3  100
## 4    4   54
## 5    5   46
## 6    6    8
## 7    7   46

And the split information can be obtained with the (still unexported) .list.rules.party() function. You just need to ask for all node IDs (the default is to use just the terminal node IDs):

partykit:::.list.rules.party(irisct, i = nodeids(irisct))
##                                                               1 
##                                                              "" 
##                                                               2 
##                                           "Petal.Length <= 1.9" 
##                                                               3 
##                                            "Petal.Length > 1.9" 
##                                                               4 
##                       "Petal.Length > 1.9 & Petal.Width <= 1.7" 
##                                                               5 
## "Petal.Length > 1.9 & Petal.Width <= 1.7 & Petal.Length <= 4.8" 
##                                                               6 
##  "Petal.Length > 1.9 & Petal.Width <= 1.7 & Petal.Length > 4.8" 
##                                                               7 
##                        "Petal.Length > 1.9 & Petal.Width > 1.7" 

这篇关于R提取内部节点信息并从ctree中分割(partykit)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆