如何修剪R中的树? [英] How to prune a tree in R?

查看:100
本文介绍了如何修剪R中的树?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在R中使用rpart进行分类.树模型是由以下人员训练的:

I'm doing a classification using rpart in R. The tree model is trained by:

> tree <- rpart(activity ~ . , data=trainData)
> pData1 <- predict(tree, testData, type="class")

此树模型的准确性为:

> sum(testData$activity==pData1)/length(pData1)
[1] 0.8094276

我阅读了教程通过交叉验证对树进行修剪:

I read a tutorial to prune the tree by cross validation:

> ptree <- prune(tree,cp=tree$cptable[which.min(tree$cptable[,"xerror"]),"CP"])
> pData2 <- predict(ptree, testData, type="class")

修剪后的树的准确率仍然相同:

The accuracy rate for the pruned tree is still the same:

> sum(testData$activity==pData2)/length(pData2)
[1] 0.8094276

我想知道我的修剪过的树怎么了?我如何在R中使用交叉验证来修剪树模型?谢谢.

I want to know what's wrong with my pruned tree? And how can I prune the tree model using cross validation in R? Thanks.

推荐答案

您已使用最小的交叉验证错误树.另一种选择是使用与最佳树(您选择的树)的1个标准误差内的最小树.这样做的原因是,给定错误的CV估计值,1个标准错误内的最小树与最佳(最低CV错误)树的预测效果一样好,但是用更少的项"来完成.

You have used the minimum cross-validated error tree. An alternative is to use the smallest tree that is within 1 standard error of the best tree (the one you are selecting). The reason for this is that, given the CV estimates of the error, the smallest tree within 1 standard error is doing just as good a job at prediction as the best (lowest CV error) tree, yet it is doing it with fewer "terms".

通过以下方式绘制修剪的树的成本复杂度与树大小的关系:

Plot the cost-complexity vs tree size for the un-pruned tree via:

plotcp(tree)

找到误差最小的一棵树的左侧,其cp值位于误差最小的一棵树的错误栏中.

Find the tree to the left of the one with minimum error whose cp value lies within the error bar of one with minimum error.

修剪不影响拟合树的原因可能有很多.例如,最佳树可能是算法根据?rpart.control中指定的停止规则停止的树.

There could be many reasons why pruning is not affecting the fitted tree. For example the best tree could be the one where the algorithm stopped according to the stopping rules as specified in ?rpart.control.

这篇关于如何修剪R中的树?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆