如何实现使用ctree(party package)构建的决策树的输出? [英] How to implement the output of decision tree built using the ctree (party package)?

查看:156
本文介绍了如何实现使用ctree(party package)构建的决策树的输出?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经通过party包使用ctree函数构建了决策树.它有1700个节点. 首先,在ctree中是否可以给出maxdepth自变量?我尝试了control_ctree选项,但是它抛出了一些错误消息,提示找不到ctree函数.

I have built a decision tree using the ctree function via party package. it has 1700 nodes. Firstly, is there a way in ctree to give the maxdepth argument? I tried control_ctree option but, it threw some error message saying couldnt find ctree function.

此外,如何使用此树的输出?如何在SAS或SQL等其他平台上实现它.我还对节点末尾的"* weights = 4349 "值表示什么有另一个疑问.我将如何知道哪个终端节点为哪个预测值投票.

Also, how can I consume the output of this tree?. How can it be implemented for other platforms like SAS or SQL. I also have another doubt as to what does the value "* weights = 4349 " at the end of the node signify. How will I know, that which terminal node votes for which predicted value.

推荐答案

ctree中有一个maxdepth选项.它位于ctree_control()

There is a maxdepth option in ctree. It is located in ctree_control()

您可以按以下方式使用它

You can use it as follows

airq <- subset(airquality, !is.na(Ozone))
airct <- ctree(Ozone ~ ., data = airq, controls = ctree_control(maxdepth = 3))

您还可以将拆分大小和存储桶大小限制为不小于"

You can also restrict the split sizes and the bucket sizes to be "no less than"

airct <- ctree(Ozone ~ ., data = airq, controls = ctree_control(minsplit= 50, minbucket = 20))

您还可以降低感度并降低P值

You can also to reduce increase sensetivity and lower the P-value

airct <- ctree(Ozone ~ ., data = airq, controls = ctree_control(mincriterion = 0.99))

您提到的weights = 4349只是该特定节点中观察值的数量. ctree的默认值是为每个观察值赋予1的权重,但是如果您认为观察值需要更大的权重,则可以向ctree()添加权重矢量,该权重矢量的长度必须与数据集的长度相同并且必须是非负整数.完成此操作后,必须谨慎解释weights = 4349.

The weights = 4349 you've mentioned is just the number of observations in that specific node. ctree has a default of giving a weight of 1 to every observation, but if you feel that you have observations that deserve bigger weights you can add a weights vector to the ctree() which have to be the same length as the data set and have to be non-negative integers. After you do that, the weights = 4349 will have to be interpreted with caution.

使用weights的一种方法是查看哪些观测值落在某个节点中.使用上面示例中的数据,我们可以执行以下操作

One way of using weights is to see which observations fell in a certain node. Using the data in the example above we can perform the following

airq <- subset(airquality, !is.na(Ozone))
airct <- ctree(Ozone ~ ., data = airq, controls = ctree_control(maxdepth = 3))
unique(where(airct)) #in order the get the terminal nodes
[1] 5 3 6 9 8

所以我们可以检查例如第5个节点发生了什么

so we can check what fell in node number 5 for example

n <- nodes(airct , 5)[[1]]
x <- airq[which(as.logical(n$weights)), ]  
x
    Ozone Solar.R Wind Temp Month Day
1      41     190  7.4   67     5   1
2      36     118  8.0   72     5   2
3      12     149 12.6   74     5   3
4      18     313 11.5   62     5   4
...

使用此方法,您可以创建包含终端节点信息的数据集,然后将其导入SAS或SQL

Using this method you can create data sets that will contain the informationn of you terminal nodes and then import them into SAS or SQL

您也可以从下面的答案中使用该函数获取拆分条件列表 ctree( )-如何获取每个终端节点的拆分条件列表?

You can also get the list of splitting conditions using the function from my answer below ctree() - How to get the list of splitting conditions for each terminal node?

这篇关于如何实现使用ctree(party package)构建的决策树的输出?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆