权重的Ctree分类-结果显示 [英] Ctree classification with weights - results displayed

查看:162
本文介绍了权重的Ctree分类-结果显示的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我要使用虹膜数据示例,但正确分类杂色对我来说要重要5倍.

Let's say I want to use the iris data example, but correctly classifying versicolor is 5 times more important to me.

library(party)
data(iris)
irisct <- ctree(Species ~ .,data = iris, weights=ifelse(iris$Species=='versicolor', 5, 1))
plot(irisct)

然后,树形图更改每个节点中的观察数和条件概率(将杂色乘以5).有没有一种方法可以禁用"此功能,即显示原始观测值(虹膜总数= 150)?

Then the tree graph changes the number of observations and conditional probabilities in each node (it multiplies versicolor by 5). Is there a way to "disable" this, i.e. show the original number of observations (total = 150 for iris)?

非常感谢您的帮助!

推荐答案

partykit软件包中ctree()的增强重新实现也具有更灵活的绘图功能.具体来说,node_barplot()面板函数获得了一个mainlab参数,该参数可用于自定义主标签.例如虹膜数据:

The enhanced reimplementation of ctree() in package partykit also has somewhat more flexible plotting capabilities. Specifically, the node_barplot() panel function gained a mainlab argument that can be used for customizing the main labels. For example for the iris data:

library("partykit")
ct <- ctree(Species ~ ., data = iris)

您可以设置标签矢量,然后提供访问这些标签的功能:

You can set up a vector of labels and then supply a function that accesses these:

lab <- paste("Foo", 1:7)
ml <- function(id, nobs) lab[as.numeric(id)]
plot(ct, tp_args = list(mainlab = ml))

当然,上面的示例不是很有意义,但是可以进行一些修改,以完成所需的代码.

Of course, the example above is not very meaningful but could be modified to accomplish what you want with a little bit of coding.

但是,请注意使用weights参数对某些观测值进行升采样. ctree()函数实际上将weights视为大小写权重,因此用于拆分的重要性测试确实会发生变化.随着观察次数的增加,所有p值都会变小,因此树会选择更多的分割(除非同时增加mincriterion).将上面的ct树与4个终端节点进行比较

However, be warned about the upsampling of certain observations using the weights argument. The ctree() function really treats the weights as case weights and consequently the significance tests used for splitting do change. With increased number of observations, all p-values become smaller and hence the tree selects more splits (unless mincriterion is increased simultaneously). Compare the ct tree above with 4 terminal nodes with

ct2 <- ctree(Species ~ ., data = iris, weights = rep(2, 150))
ct3 <- ctree(Species ~ ., data = iris, weights = rep(2, 150), mincriterion = 0.999)

最终的终端节点数为

c(width(ct), width(ct2), width(ct3))
[1] 4 6 4

这篇关于权重的Ctree分类-结果显示的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆