随机森林的 ROC [英] ROC for random forest

查看:66
本文介绍了随机森林的 ROC的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道 ROC 是在 tprfpr 之间绘制的,但是我很难确定应该改变哪些参数以获得不同tpr/fpr 对.

I understand that ROC is drawn between tpr and fpr, but I am having difficulty in determining which parameters I should vary to get different tpr/fpr pairs.

推荐答案

我写了这个 回答类似问题.

I wrote this answer on a similar question.

基本上,您可以增加某些类别的权重和/或对其他类别进行下采样和/或更改投票汇总规则.

Basicly you can increase weighting on certain classes and/or downsample other classes and/or change vote aggregating rule.

[]@两个班级很平衡——Suryavansh"

[] @ "the two classes are very balanced – Suryavansh"

在这种情况下,您的数据是平衡的,您应该主要使用选项 3(更改聚合规则).在 randomForest 中,这可以在训练或预测时使用截止参数访问.在其他设置中,您可能需要自己从所有树中提取所有交叉验证的投票,应用一系列规则并计算结果 fpr 和 fnr.

In such case your data is balanced you should mainly go with option 3 (changing aggregation rule). In randomForest this can be accessed with cutoff parameter either at training or at predicting. In other settings you may have to yourself to extract all cross-validated votes from all trees, apply a series of rules and calculate the resulting fpr and fnr.

library(randomForest)
library(AUC)

#some balanced data generator
make.data = function(obs=5000,vars=6,noise.factor = .4) {
  X = data.frame(replicate(vars,rnorm(obs)))
  yValue = with(X,sin(X1*pi)+sin(X2*pi*2)^3+rnorm(obs)*noise.factor)
  yClass = (yValue<median(yValue))*1
  yClass = factor(yClass,labels=c("red","green"))
  print(table(yClass)) #five classes, first class has 1% prevalence only
  Data=data.frame(X=X,y=yClass)
}

#plot true class separation
Data = make.data()
par(mfrow=c(1,1))
plot(Data[,1:2],main="separation problem: predict red/green class",
     col = c("#FF000040","#00FF0040")[as.numeric(Data$y)])

#train default RF
rf1 = randomForest(y~.,Data)
#you can choose a given threshold from this ROC plot
plot(roc(rf1$votes[,1],rf1$y),main="chose a threshold from")

#create at testData set from same generator
testData = make.data() 


#predict with various cutoff's 
predTable = data.frame(
  trueTest = testData$y,
  majorityVote = predict(rf1,testData),
  #~3 times increase false red
  Pred.alot.Red = factor(predict(rf1,testData,cutoff=c(.3,.1))),
  #~3 times increase false green
  Pred.afew.Red = factor(predict(rf1,testData,cutoff=c(.1,.3)))
)

#see confusion tables
table(predTable[,c(1,2)])/5000
        majorityVote
trueTest    red  green
   red   0.4238 0.0762
  green 0.0818 0.4182

.

table(predTable[,c(1,3)])/5000
        Pred.alot.Red
trueTest    red  green
    red   0.2902 0.2098
    green 0.0158 0.4842

.

table(predTable[,c(1,4)])/5000
         Pred.afew.Red
trueTest    red  green
    red   0.4848 0.0152
    green 0.2088 0.2912

.

这篇关于随机森林的 ROC的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆