使用rpart包在R中的ROC曲线? [英] ROC curve in R using rpart package?

查看:313
本文介绍了使用rpart包在R中的ROC曲线?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我拆分了 Train 数据集和 Test 数据集。

I split Train data set and Test data set.

我在R(仅火车组)中为CART(分类树)使用了包 rpart 。我想使用 ROCR 包进行ROC分析。

I used a package rpart for CART (classification tree) in R (only train set). And I want to carry out a ROC analysis using the ROCR package.

变量为`n。使用(响应变量... 1 =是,0 =否):

Variable is `n. use' (response varible... 1=yes, 0=no):

> Pred2 = prediction(Pred.cart, Test$n.use)
Error in prediction(Pred.cart, Test$n.use) : 
  **Format of predictions is invalid.**

这是我的代码。怎么了什么是正确的 type class prob

This is my code. What is problem? And what is right type ("class" or "prob"?

library(rpart)
train.cart = rpart(n.use~., data=Train, method="class")

Pred.cart = predict(train.cart, newdata = Test, type = "class")

Pred2 = prediction(Pred.cart, Test$n.use)
roc.cart = performance(Pred2, "tpr", "fpr")


推荐答案

ROCR 中的 prediction()函数程序包需要预测的成功概率以及观察到的失败与成功的因数。要获得前者,您需要应用 predict(...,type = prob) rpart 对象(即 not class )。 ,因为这会返回一个概率矩阵,每个响应类只有一列,因此您需要选择成功类列。

The prediction() function from the ROCR package expects the predicted "success" probabilities and the observed factor of failures vs. successes. In order to obtain the former you need to apply predict(..., type = "prob") to the rpart object (i.e., not "class"). However, as this returns a matrix of probabilities with one column per response class you need to select the "success" class column.

不幸的是,如您的示例所示,我无法重现正在使用驼背症 dat rpart 包中的a作为示例:

As your example, unfortunately, is not reproducible I'm using the kyphosis data from the rpart package for illustration:

library("rpart")
data("kyphosis", package = "rpart")
rp <- rpart(Kyphosis ~ ., data = kyphosis)

然后您可以从 ROCR prediction()函数>。在这里,我使用的是样本内(培训)数据,但是同样可以应用于样本外(测试数据):

Then you can apply the prediction() function from ROCR. Here, I'm using the in-sample (training) data but the same can be applied out of sample (test data):

library("ROCR")
pred <- prediction(predict(rp, type = "prob")[, 2], kyphosis$Kyphosis)

您可以可视化ROC曲线:

And you can visualize the ROC curve:

plot(performance(pred, "tpr", "fpr"))
abline(0, 1, lty = 2)

或截止值的准确性:

plot(performance(pred, "acc"))

ROCR 支持的任何其他地块和汇总。

Or any of the other plots and summaries supported by ROCR.

这篇关于使用rpart包在R中的ROC曲线?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆