J48树(RWeka)中的属性及其值 [英] Properties and their values out of J48 tree (RWeka)
问题描述
如果运行以下命令:
library(RWeka)
data(iris)
res = J48(Species ~., data = iris)
res
将是从Weka_tree
继承的类J48
的列表.如果您打印
res
will be a list of class J48
inheriting from Weka_tree
. If you print it
R> res
J48 pruned tree
------------------
Petal.Width <= 0.6: setosa (50.0)
Petal.Width > 0.6
| Petal.Width <= 1.7
| | Petal.Length <= 4.9: versicolor (48.0/1.0)
| | Petal.Length > 4.9
| | | Petal.Width <= 1.5: virginica (3.0)
| | | Petal.Width > 1.5: versicolor (3.0/1.0)
| Petal.Width > 1.7: virginica (46.0/1.0)
Number of Leaves : 5
Size of the tree : 9
我想按从右到左的顺序获取属性及其值.因此,在这种情况下:
I would like to get the properties and their values by their order from right to left. So for this case:
Petal.Width, Petal.Width, Petal.Length, Petal.Length.
我试图将res输入一个因子并运行命令:
I tried to enter res to a factor and to run the command:
str_extract(paste0(x, collapse=""), perl("(?<=\\|)[A-Za-z]+(?=\\|)"))
没有成功. 只是要记住,我们应该忽略左边的字符.
with no success. Just to remember that we should ignore the left around characters.
推荐答案
一种方法是将RWeka
的J48
对象转换为partykit
的party
对象.您只需要按as.party(res)
即可,这将为您完成所有解析,并返回一个更易于与标准化提取器功能等配合使用的结构.
One way to do this is to convert the J48
object from RWeka
to a party
object from partykit
. You just need to as as.party(res)
and this does all the parsing for you and returns a structure that is easier to work with with standardized extractor functions etc.
特别地,您可以使用在其他讨论中给出的关于ctree
对象等的所有建议.请参见
In particular you can then use all advice given in other discussions about ctree
objects etc. See
-
而且我认为以下内容至少应满足您的要求:
And I think the following should do at least part of what you want:
library("partykit") pres <- as.party(res) partykit:::.list.rules.party(pres) ## 2 ## "Petal.Width <= 0.6" ## 5 ## "Petal.Width > 0.6 & Petal.Width <= 1.7 & Petal.Length <= 4.9" ## 7 ## "Petal.Width > 0.6 & Petal.Width <= 1.7 & Petal.Length > 4.9 & Petal.Width <= 1.5" ## 8 ## "Petal.Width > 0.6 & Petal.Width <= 1.7 & Petal.Length > 4.9 & Petal.Width > 1.5" ## 9 ## "Petal.Width > 0.6 & Petal.Width > 1.7"
更新:操作人员将我与名单外的联系人联系在一起,询问相关问题,要求提供树的特定印刷表示形式.我将我的解决方案包括在这里,以防它对其他人有用.
Update: The OP contacted me off-list for a related question, asking for a specific printed representation of the tree. I'm including my solution here in case it is useful for someone else.
他想用()符号表示层次结构级别以及拆分变量的名称.一种方法是(1)提取基础数据的变量名:
He wanted to have ( ) symbols signalling the hierarchy levels plus the names of the splitting variables. One way to do so would be to (1) extract variable names of the underlying data:
nam <- names(pres$data)
(2)将树的递归节点结构转换为平面列表(这对于构造所需的字符串有些方便):
(2) Turn the recursive node structure of the tree into a flat list (which is somewhat more convenient for constructing the desired string):
tr <- as.list(pres$node)
(3a)初始化字符串:
(3a) Initialize the string:
str <- "("
(3b)递归在字符串中添加方括号和/或变量名:
(3b) Recursively add brackets and/or variable names to the string:
update_str <- function(x) { if(is.null(x$kids)) { str <<- paste(str, ")") } else { str <<- paste(str, nam[x$split$varid], "(") for(i in x$kids) update_str(tr[[i]]) } }
(3c)从根节点开始调用递归:
(3c) Call the recursion, starting from the root node:
update_str(tr[[1]]) str ## [1] "( Petal.Width ( ) Petal.Width ( Petal.Length ( ) Petal.Width ( ) ) )"
这篇关于J48树(RWeka)中的属性及其值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!