MLlib:计算多个阈值的精度和召回率 [英] MLlib: Calculating Precision and Recall for multiple threshold values

查看:31
本文介绍了MLlib:计算多个阈值的精度和召回率的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在将逻辑回归的阈值用于评分之前,我将其设置为 0.5.我现在想获得该值的精度、召回率和 f1 分数.不幸的是,当我尝试这样做时,我看到的唯一阈值是 1.0 和 0.0.如何获取除 0 和 1 以外的阈值的指标.

I set the setting the threshold value of my logistic regression to 0.5 before I use it for scoring. I now want to get precision, recall, f1 score for that value. Unfortunately, when I try doing that the only threshold values that I see are 1.0 and 0.0. How do I get metrics for threshold values other than 0 and 1.

例如这里是 o/p:

阈值为:1.0,精度为:0.85

Threshold is: 1.0, Precision is: 0.85

阈值为:0.0,精度为:0.312641

Threshold is: 0.0, Precision is: 0.312641

我没有得到阈值 0.5 的精度.这是相关的代码.

I don't get Precision for Threshold 0.5. Here is the relevant code.

//我在这里设置我的逻辑回归模型的阈值.

// I am setting the threshold value of my Logistic regression model here.

model.setThreshold(0.5)

// Compute the score and generate an RDD with prediction and label values.  
val predictionAndLabels = data.map { 
  case LabeledPoint(label, features) => (model.predict(features), label)
}

//我现在想计算精度和召回率以及其他指标.由于我已将模型阈值设置为 0.5,因此我希望获得该值的 PR.

// I now want to compute the precision and recall and other metrics. Since I have set the model threshold to 0.5, I want to get PR at that value.

val metrics = new BinaryClassificationMetrics(predictionAndLabels)
val precision = metrics.precisionByThreshold()

precision.foreach { 
  case (t, p) => {
    println(s"Threshold is: $t, Precision is: $p")

    if (t == 0.5) {
      println(s"Desired: Threshold is: $t, Precision is: $p")        
    }
}

推荐答案

precisionByThreshold() 方法实际上是尝试不同的阈值并给出相应的精度值.由于您已经对数据进行了阈值处理,因此您只有 0 和 1.

The precisionByThreshold() method is actually trying different thresholds and giving the corresponding precision values. Since you already thresholded your data, you only have 0s and 1s.

假设您有:[0 0 0 1 1 1] 阈值化后的真实标签为[f f f f t t].

Let's say you have: [0 0 0 1 1 1] after thresholding and the real labels are [f f f f t t].

然后用 0 进行阈值处理,你有 [tttttt] 这给你 4 个假阳性和 2 个真阳性,因此精度为 2/(2 + 4)= 1/3

Then thresholding with 0 you have [t t t t t t] which gives you 4 false positive and 2 true positive hence a precision of 2 / (2 + 4) = 1/3

现在用 1 进行阈值处理,你有 [fffttt] ,它给你 1 个假阳性和 2 个真阳性,因此 2/(2 + 1) = 2/3

Now thresholding with 1 you have [f f f t t t] which and gives you 1 false positive and 2 true positive hence a precision of 2 /(2 + 1) = 2/3

您可以看到,现在使用 0.5 的阈值会给您 [fffttt],与使用 1 进行阈值处理相同,因此您正在寻找阈值 1 的精度.

You can see that using a threshold of .5 now would give you [f f f t t t], the same as thresholding with 1, so it is the precision for threshold 1 that you are looking for.

这有点令人困惑,因为您已经对预测进行了阈值处理.如果你没有对你的预测设定阈值,假设你有 [.3 .4 .4 .6 .8 .9](为了与 [0 0 0 1 1 1] 我一直在使用).

This is a bit confusing because you already thresholded your predictions. If you do not threshold your predictions, and let's say you had [.3 .4 .4 .6 .8 .9] (to stay consistent with the [0 0 0 1 1 1] I have been using).

然后 precisionByThreshold() 会为您提供阈值 0、.3、.4、.6 .8 .9 的精度值,因为这些都是给出不同结果的阈值,因此具有不同的精度,并且要获得阈值 .5 的值,您仍将取下一个更大的阈值 (.6) 的值,因为同样,它会给出相同的预测,因此具有相同的精度.

Then precisionByThreshold() would give you precisions values for threshold 0, .3, .4, .6 .8 .9, because these are all the threshold giving different results and thus different precisions, and to get the value for threshold .5 you would still take the value for next larger threshold (.6) because again, it would give the same predictions hence the same precision.

这篇关于MLlib:计算多个阈值的精度和召回率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆