如何实现使用ctree(party package)构建的决策树的输出? [英] How to implement the output of decision tree built using the ctree (party package)?
问题描述
我已经通过party
包使用ctree
函数构建了决策树.它有1700个节点.
首先,在ctree
中是否可以给出maxdepth
自变量?我尝试了control_ctree
选项,但是它抛出了一些错误消息,提示找不到ctree函数.
I have built a decision tree using the ctree
function via party
package. it has 1700 nodes.
Firstly, is there a way in ctree
to give the maxdepth
argument? I tried control_ctree
option but, it threw some error message saying couldnt find ctree function.
此外,如何使用此树的输出?如何在SAS或SQL等其他平台上实现它.我还对节点末尾的"* weights = 4349 "
值表示什么有另一个疑问.我将如何知道哪个终端节点为哪个预测值投票.
Also, how can I consume the output of this tree?. How can it be implemented for other platforms like SAS or SQL. I also have another doubt as to what does the value "* weights = 4349 "
at the end of the node signify. How will I know, that which terminal node votes for which predicted value.
推荐答案
ctree中有一个maxdepth
选项.它位于ctree_control()
There is a maxdepth
option in ctree. It is located in ctree_control()
您可以按以下方式使用它
You can use it as follows
airq <- subset(airquality, !is.na(Ozone))
airct <- ctree(Ozone ~ ., data = airq, controls = ctree_control(maxdepth = 3))
您还可以将拆分大小和存储桶大小限制为不小于"
You can also restrict the split sizes and the bucket sizes to be "no less than"
airct <- ctree(Ozone ~ ., data = airq, controls = ctree_control(minsplit= 50, minbucket = 20))
您还可以降低感度并降低P值
You can also to reduce increase sensetivity and lower the P-value
airct <- ctree(Ozone ~ ., data = airq, controls = ctree_control(mincriterion = 0.99))
您提到的weights = 4349
只是该特定节点中观察值的数量. ctree
的默认值是为每个观察值赋予1的权重,但是如果您认为观察值需要更大的权重,则可以向ctree()
添加权重矢量,该权重矢量的长度必须与数据集的长度相同并且必须是非负整数.完成此操作后,必须谨慎解释weights = 4349
.
The weights = 4349
you've mentioned is just the number of observations in that specific node. ctree
has a default of giving a weight of 1 to every observation, but if you feel that you have observations that deserve bigger weights you can add a weights vector to the ctree()
which have to be the same length as the data set and have to be non-negative integers. After you do that, the weights = 4349
will have to be interpreted with caution.
使用weights
的一种方法是查看哪些观测值落在某个节点中.使用上面示例中的数据,我们可以执行以下操作
One way of using weights
is to see which observations fell in a certain node. Using the data in the example above we can perform the following
airq <- subset(airquality, !is.na(Ozone))
airct <- ctree(Ozone ~ ., data = airq, controls = ctree_control(maxdepth = 3))
unique(where(airct)) #in order the get the terminal nodes
[1] 5 3 6 9 8
所以我们可以检查例如第5个节点发生了什么
so we can check what fell in node number 5 for example
n <- nodes(airct , 5)[[1]]
x <- airq[which(as.logical(n$weights)), ]
x
Ozone Solar.R Wind Temp Month Day
1 41 190 7.4 67 5 1
2 36 118 8.0 72 5 2
3 12 149 12.6 74 5 3
4 18 313 11.5 62 5 4
...
使用此方法,您可以创建包含终端节点信息的数据集,然后将其导入SAS或SQL
Using this method you can create data sets that will contain the informationn of you terminal nodes and then import them into SAS or SQL
您也可以从下面的答案中使用该函数获取拆分条件列表 ctree( )-如何获取每个终端节点的拆分条件列表?
You can also get the list of splitting conditions using the function from my answer below ctree() - How to get the list of splitting conditions for each terminal node?
这篇关于如何实现使用ctree(party package)构建的决策树的输出?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!