如何从决策树计算错误率? [英] How to compute error rate from a decision tree?

查看:1959
本文介绍了如何从决策树计算错误率?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有人知道如何用R计算决策树的错误率吗?
我正在使用 rpart()函数。

Does anyone know how to calculate the error rate for a decision tree with R? I am using the rpart() function.

推荐答案

假设您的意思是用于拟合模型的样本的计算错误率,则可以使用 printcp()。例如,使用在线示例,

Assuming you mean computing error rate on the sample used to fit the model, you can use printcp(). For example, using the on-line example,

> library(rpart)
> fit <- rpart(Kyphosis ~ Age + Number + Start, data=kyphosis)
> printcp(fit)

Classification tree:
rpart(formula = Kyphosis ~ Age + Number + Start, data = kyphosis)

Variables actually used in tree construction:
[1] Age   Start

Root node error: 17/81 = 0.20988

n= 81 

        CP nsplit rel error  xerror    xstd
1 0.176471      0   1.00000 1.00000 0.21559
2 0.019608      1   0.82353 0.82353 0.20018
3 0.010000      4   0.76471 0.82353 0.20018

当考虑 rel错误中显示的值时,根节点错误用于计算两种预测性能 xerror 列,并根据复杂度参数(第一列):

The Root node error is used to compute two measures of predictive performance, when considering values displayed in the rel error and xerror column, and depending on the complexity parameter (first column):


  • 0.76471 x 0.20988 = 0.1604973(16.0%)是替代错误率(即,在训练样本上计算出的错误率)-大致是

  • 0.76471 x 0.20988 = 0.1604973 (16.0%) is the resubstitution error rate (i.e., error rate computed on the training sample) -- this is roughly

class.pred <- table(predict(fit, type="class"), kyphosis$Kyphosis)
1-sum(diag(class.pred))/sum(class.pred)


  • 0.82353 x 0.20988 = 0.1728425(17.2%)是交叉验证的错误率(使用10倍CV,请参见 xval rpart.control()中;但另请参见 xpred.rpart() plotcp()依赖于这种方法)。

  • 0.82353 x 0.20988 = 0.1728425 (17.2%) is the cross-validated error rate (using 10-fold CV, see xval in rpart.control(); but see also xpred.rpart() and plotcp() which relies on this kind of measure). This measure is a more objective indicator of predictive accuracy.

    请注意,它与分类准确性或多或少地一致从

    Note that it is more or less in agreement with classification accuracy from tree:

    > library(tree)
    > summary(tree(Kyphosis ~ Age + Number + Start, data=kyphosis))
    
    Classification tree:
    tree(formula = Kyphosis ~ Age + Number + Start, data = kyphosis)
    Number of terminal nodes:  10 
    Residual mean deviance:  0.5809 = 41.24 / 71 
    Misclassification error rate: 0.1235 = 10 / 81 
    

    其中分类错误率是根据训练样本计算得出的。

    where Misclassification error rate is computed from the training sample.

    这篇关于如何从决策树计算错误率?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

  • 查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆