数据帧中每一行的Ranger预测类概率 [英] Ranger Predicted Class Probability of each row in a data frame

查看:104
本文介绍了数据帧中每一行的Ranger预测类概率的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

关于此链接



不是最好的图,但是有时如果标签被翻转,您会看到一些奇怪的东西。我们需要找到具有最大概率的列并分配标签,为此,我们这样做:

  max.col(probabilities )-1 
[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 0 1 1 0 1 1 1 1 1 1 1 1 0
[39] 0 0 0 0 0 0 0 0 0 0 0

根据哪一列的最大概率,一行概率返回1或2,我们简单地从中减去1即可得出0,1。对于混淆矩阵:

  caret :: confusionMatrix(table(max.col(probabilities)-1,Test_Set $ Species)) 
混淆矩阵和统计数据


0 1
0 31 2
1 0 17


准确性:0.96
95%CI:(0.8629,0.9951)
无信息率:0.62
P值[Acc> NIR]:2.048e-08

在您的情况下,您可以这样做:

  confusionMatrix(table(max.col(probabilities)-1,Test_Set $ BiClass))


With regard to this link Predicted probabilities in R ranger package, I have a question.

Imagine I have a mixed data frame, df (comprising of factor and numeric variables) and I want to do classification using ranger. I am splitting this data frame as test and train sets as Train_Set and Test_Set. BiClass is my prediction factor variable and comprises of 0 and 1 (2 levels)

I want to calculate and attach class probabilities to the data frame using ranger using the following commands:

Biclass.ranger <- ranger(BiClass ~ ., ,data=Train_Set, num.trees = 500, importance="impurity", save.memory = TRUE, probability=TRUE)

probabilities <- as.data.frame(predict(Biclass.ranger, data = Test_Set, num.trees = 200, type='response', verbose = TRUE)$predictions)

The data frame probabilities is a data frame consisting of 2 columns (0 and 1) with number of rows equal to the number of rows in Test_Set.

Does it mean, if I append or attach this data frame, namely, probabilities to the Test_Set as the last two columns, it shows the probability of each row being either 0 or 1? Is my understanding correct?

My second question, when I attempt to calcuate confusion matrix through

pred = predict(Biclass.ranger, data=Test_Set, num.trees = 500, type='response', verbose = TRUE)
table(Test_Set$BiClass, pred$predictions)

I get the following error: Error in table(Test_Set$BiClass, pred$predictions) : all arguments must have the same length

What am I doing wrong?

解决方案

For your first question yes, it shows the probability of each row being 0 or 1. Using the example below:

library(ranger)
idx = sample(nrow(iris),100)
data = iris
data$Species = factor(ifelse(data$Species=="versicolor",1,0))
Train_Set = data[idx,]
Test_Set = data[-idx,]

mdl <- ranger(Species ~ ., ,data=Train_Set,importance="impurity", save.memory = TRUE, probability=TRUE)
probabilities <- as.data.frame(predict(mdl, data = Test_Set,type='response', verbose = TRUE)$predictions)

We can always check whether they agree:

par(mfrow=c(1,2))
boxplot(probabilities[,"0"] ~ Test_Set$Species,ylab="Prob 0",xlab="Actual label")
boxplot(probabilities[,"1"] ~ Test_Set$Species,ylab="Prob 1",xlab="Actual label")

Not the best plot, but sometimes if the labels are flipped you will see something weird. We need to find the column that has the max probability and assign the label, for this we do:

max.col(probabilities) - 1
[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 1 1 0 1 1 1 1 1 1 1 1 1 0
[39] 0 0 0 0 0 0 0 0 0 0 0 0

This goes through each row of probabilities returns 1 or 2 depending on which column has maximum probability and we simply subtract 1 from it to get 0,1. For the confusion matrix:

caret::confusionMatrix(table(max.col(probabilities) - 1,Test_Set$Species))
Confusion Matrix and Statistics


     0  1
  0 31  2
  1  0 17

               Accuracy : 0.96            
                 95% CI : (0.8629, 0.9951)
    No Information Rate : 0.62            
    P-Value [Acc > NIR] : 2.048e-08 

In your case, you can just do:

confusionMatrix(table(max.col(probabilities)-1,Test_Set$BiClass))

这篇关于数据帧中每一行的Ranger预测类概率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆