是否可以使用R在分类树分析中获得节点的p值? [英] Is it possible to get a p-value for nodes in a categorical tree analysis with R?
本文介绍了是否可以使用R在分类树分析中获得节点的p值?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
在使用R的分类树分析中是否可以为节点获取p值?我正在使用rpart,无法为每个节点找到p值.也许这只能通过回归而不是类别来实现.
Is it possible to get a p-value for nodes in a categorical tree analysis with R? I am using rpart and can't locate a p-value for each node. Maybe this is only possible with a regression and not categories.
structure(list(subj = c(702L, 702L, 702L, 702L, 702L, 702L, 702L,
702L, 702L, 702L, 702L, 702L, 702L, 702L, 702L, 702L, 702L, 702L,
702L, 702L, 702L, 702L, 702L, 702L), visit = c(4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L), run = structure(c(1L, 1L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L,
4L), .Label = c("A", "B", "C", "D", "E", "xdur", "xend60", "xpre"
), class = "factor"), ho = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L
), hph = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), longexer = structure(c(2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("10min", "60min"), class = "factor"),
esq_sick = c(NA, NA, 0L, NA, NA, NA, NA, NA, NA, NA, 0L,
NA, NA, NA, NA, NA, NA, NA, 0L, NA, NA, NA, NA, NA), esq_sick2 = c(NA,
NA, 0L, NA, NA, NA, NA, NA, NA, NA, 0L, NA, NA, NA, NA, NA,
NA, NA, 0L, NA, NA, NA, NA, NA), ll_sick = c(NA, NA, 0L,
NA, NA, NA, NA, NA, NA, NA, 0L, NA, NA, NA, NA, NA, NA, NA,
0L, NA, NA, NA, NA, NA), ll_sick2 = c(NA, NA, 0L, NA, NA,
NA, NA, NA, NA, NA, 0L, NA, NA, NA, NA, NA, NA, NA, 0L, NA,
NA, NA, NA, NA), esq_01 = c(NA, NA, 2L, NA, NA, NA, NA, NA,
NA, NA, 2L, NA, NA, NA, NA, NA, NA, NA, 1L, NA, NA, NA, NA,
NA), esq_02 = c(NA, NA, 1L, NA, NA, NA, NA, NA, NA, NA, 2L,
NA, NA, NA, NA, NA, NA, NA, 1L, NA, NA, NA, NA, NA), esq_03 = c(NA,
NA, 0L, NA, NA, NA, NA, NA, NA, NA, 1L, NA, NA, NA, NA, NA,
NA, NA, 0L, NA, NA, NA, NA, NA), esq_04 = c(NA, NA, 0L, NA,
NA, NA, NA, NA, NA, NA, 0L, NA, NA, NA, NA, NA, NA, NA, 0L,
NA, NA, NA, NA, NA), esq_05 = c(NA, NA, 0L, NA, NA, NA, NA,
NA, NA, NA, 0L, NA, NA, NA, NA, NA, NA, NA, 0L, NA, NA, NA,
NA, NA), esq_06 = c(NA, NA, 1L, NA, NA, NA, NA, NA, NA, NA,
1L, NA, NA, NA, NA, NA, NA, NA, 1L, NA, NA, NA, NA, NA),
esq_07 = c(NA, NA, 0L, NA, NA, NA, NA, NA, NA, NA, 0L, NA,
NA, NA, NA, NA, NA, NA, 1L, NA, NA, NA, NA, NA), esq_08 = c(NA,
NA, 0L, NA, NA, NA, NA, NA, NA, NA, 0L, NA, NA, NA, NA, NA,
NA, NA, 0L, NA, NA, NA, NA, NA), esq_09 = c(NA, NA, 0L, NA,
NA, NA, NA, NA, NA, NA, 0L, NA, NA, NA, NA, NA, NA, NA, 0L,
NA, NA, NA, NA, NA), esq_10 = c(NA, NA, 0L, NA, NA, NA, NA,
NA, NA, NA, 0L, NA, NA, NA, NA, NA, NA, NA, 0L, NA, NA, NA,
NA, NA)), .Names = c("subj", "visit", "run", "ho", "hph",
"longexer", "esq_sick", "esq_sick2", "ll_sick", "ll_sick2", "esq_01",
"esq_02", "esq_03", "esq_04", "esq_05", "esq_06", "esq_07", "esq_08",
"esq_09", "esq_10"), row.names = 7:30, class = "data.frame")
alldata = read.table('symptomology CSV2.csv',header=TRUE,sep=",")
library(rpart)
fit <- rpart(esq_sick2~esq_01_bin + esq_02_bin + esq_03_bin + esq_04_bin + esq_05_bin + esq_06_bin + esq_07_bin + esq_08_bin + esq_09_bin + esq_10_bin + esq_11_bin + esq_12_bin + esq_13_bin + esq_14_bin + esq_15_bin + esq_16_bin + esq_17_bin + esq_18_bin + esq_19_bin + esq_20_bin, method="class", data=alldata)
plot(fit, uniform = FALSE, branch = 1, compress = FALSE, nspace, margin = 0.1, minbranch = 0.3)
text(fit, use.n=TRUE, all=TRUE, cex=.8)
推荐答案
以下示例可能会为您提供帮助.我正在使用内置的airquality
数据集和ctree
帮助中提供的示例:
Here's an example that might help you. I'm using the built-in airquality
data set and the example provided in the help for ctree
:
library(partykit)
# For the sctest function to extract p-values (see help for ctree and sctest)
library(strucchange)
# Data we'll use
airq <- subset(airquality, !is.na(Ozone))
# Build the tree
airct <- ctree(Ozone ~ ., data = airq)
看树:
airct
Model formula:
Ozone ~ Solar.R + Wind + Temp + Month + Day
Fitted party:
[1] root
| [2] Temp <= 82
| | [3] Wind <= 6.9: 55.600 (n = 10, err = 21946.4)
| | [4] Wind > 6.9
| | | [5] Temp <= 77: 18.479 (n = 48, err = 3956.0)
| | | [6] Temp > 77: 31.143 (n = 21, err = 4620.6)
| [7] Temp > 82
| | [8] Wind <= 10.3: 81.633 (n = 30, err = 15119.0)
| | [9] Wind > 10.3: 48.714 (n = 7, err = 1183.4)
提取p值:
sctest(airct)
$`1`
Solar.R Wind Temp Month Day
statistic 13.34761286 4.161370e+01 5.608632e+01 3.1126596 0.02011554
p.value 0.00129309 5.560572e-10 3.468337e-13 0.3325881 0.99998175
$`2`
Solar.R Wind Temp Month Day
statistic 5.4095322 12.968549828 11.298951405 0.2148961 2.970294
p.value 0.0962041 0.001582833 0.003871534 0.9941976 0.357956
$`3`
NULL
$`4`
Solar.R Wind Temp Month Day
statistic 9.547191843 2.307676 11.598966936 0.06604893 0.2513143
p.value 0.009972755 0.497949 0.003295072 0.99965679 0.9916670
$`5`
Solar.R Wind Temp Month Day
statistic 6.14094026 1.3865355 1.9986304 0.8268341 1.3580462
p.value 0.06432172 0.7447599 0.5753799 0.8952749 0.7528481
$`6`
Solar.R Wind Temp Month Day
statistic 5.1824354 0.02060939 0.9270013 0.165171 4.6220522
p.value 0.1089932 0.99998062 0.8705785 0.996871 0.1481643
$`7`
Solar.R Wind Temp Month Day
statistic 0.8083249 11.711564549 6.77148538 0.1307643 0.03992875
p.value 0.8996614 0.003101788 0.04546281 0.9982052 0.99990034
$`8`
Solar.R Wind Temp Month Day
statistic 0.9056479 3.1585094 2.9285252 0.008106707 0.008686293
p.value 0.8759687 0.3247585 0.3657072 0.999998099 0.999997742
$`9`
NULL
这篇关于是否可以使用R在分类树分析中获得节点的p值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文