MLlib:计算多个阈值的精度和召回率 [英] MLlib: Calculating Precision and Recall for multiple threshold values

查看：230 发布时间：2021/4/8 19:42:46 scala apache-spark apache-spark-mllib

本文介绍了MLlib:计算多个阈值的精度和召回率的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在将逻辑回归的阈值用于评分之前，我将其阈值设置为0.5.我现在想获得该值的精度，召回率，f1分数.不幸的是，当我尝试这样做时，我看到的唯一阈值是1.0和0.0.如何获取非0和1阈值的指标.

例如，这是o/p:

阈值是:1.0，精度是:0.85

阈值是:0.0，精度是:0.312641

我没有获得阈值0.5的精度.这是相关的代码.

///我在这里设置Logistic回归模型的阈值.

  model.setThreshold(0.5)//计算分数并生成带有预测值和标签值的RDD.val预测AndLabels = data.map {case LabeledPoint(label，features)=>(model.predict(功能)，标签)}

//现在，我想计算精度和召回率以及其他指标.由于我已将模型阈值设置为0.5，因此我希望获得该值的PR.

  val指标=新的BinaryClassificationMetrics(predictionAndLabels)val精度= metrics.precisionByThreshold()precision.foreach {情况(t，p)＝>{println(s阈值为:$ t，精度为:$ p")如果(t == 0.5){println(s期望:阈值为:$ t，精度为:$ p")}}

解决方案

precisionByThreshold()方法实际上是尝试使用不同的阈值并给出相应的精度值.由于您已经设置了数据阈值，因此只有0和1.

假设您有:阈值后的 [0 0 0 1 1 1 1] 和真实标签为 [f f f f t t] .

然后使用 0 进行阈值处理，得到 [tttttt] ，这将为您提供4个假阳性和2个真阳性，因此精度为 2/(2 + 4)= 1/3

现在使用 1 进行阈值处理，您就有了 [fffttt] ，这将为您提供1个假阳性和2个真阳性，因此精度为 2/(2 + 1)= 2/3

您可以看到，现在使用.5阈值将为您提供 [fffttt] ，与使用1进行阈值化相同，因此它就是您要寻找的阈值1的精度./p>

这有点令人困惑，因为您已经对预测设定了阈值.如果您不对预测进行阈值设置，则假设您拥有 [.3 .4 .4 .6 .8 .9] (与 [0 0 0 1 1 1保持一致] 我一直在使用).

然后 precisionByThreshold()将为您提供阈值0，.3，.4，.6 .8 .9的精度值，因为所有这些阈值都给出了不同的结果，因此精度也有所不同，并获得阈值.5的值，您仍将取下一个较大阈值(.6)的值，因为同样，它会给出相同的预测，因此精度也相同.

I set the setting the threshold value of my logistic regression to 0.5 before I use it for scoring. I now want to get precision, recall, f1 score for that value. Unfortunately, when I try doing that the only threshold values that I see are 1.0 and 0.0. How do I get metrics for threshold values other than 0 and 1.

For example here is the o/p:

Threshold is: 1.0, Precision is: 0.85

Threshold is: 0.0, Precision is: 0.312641

I don't get Precision for Threshold 0.5. Here is the relevant code.

// I am setting the threshold value of my Logistic regression model here.

model.setThreshold(0.5)

// Compute the score and generate an RDD with prediction and label values.  
val predictionAndLabels = data.map { 
  case LabeledPoint(label, features) => (model.predict(features), label)
}

// I now want to compute the precision and recall and other metrics. Since I have set the model threshold to 0.5, I want to get PR at that value.

val metrics = new BinaryClassificationMetrics(predictionAndLabels)
val precision = metrics.precisionByThreshold()

precision.foreach { 
  case (t, p) => {
    println(s"Threshold is: $t, Precision is: $p")

    if (t == 0.5) {
      println(s"Desired: Threshold is: $t, Precision is: $p")        
    }
}

解决方案

The precisionByThreshold() method is actually trying different thresholds and giving the corresponding precision values. Since you already thresholded your data, you only have 0s and 1s.

Let's say you have: [0 0 0 1 1 1] after thresholding and the real labels are [f f f f t t].

Then thresholding with 0 you have [t t t t t t] which gives you 4 false positive and 2 true positive hence a precision of 2 / (2 + 4) = 1/3

Now thresholding with 1 you have [f f f t t t] which and gives you 1 false positive and 2 true positive hence a precision of 2 /(2 + 1) = 2/3

You can see that using a threshold of .5 now would give you [f f f t t t], the same as thresholding with 1, so it is the precision for threshold 1 that you are looking for.

This is a bit confusing because you already thresholded your predictions. If you do not threshold your predictions, and let's say you had [.3 .4 .4 .6 .8 .9] (to stay consistent with the [0 0 0 1 1 1] I have been using).

Then precisionByThreshold() would give you precisions values for threshold 0, .3, .4, .6 .8 .9, because these are all the threshold giving different results and thus different precisions, and to get the value for threshold .5 you would still take the value for next larger threshold (.6) because again, it would give the same predictions hence the same precision.

这篇关于MLlib:计算多个阈值的精度和召回率的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

MLlib:计算多个阈值的精度和召回率 [英] MLlib: Calculating Precision and Recall for multiple threshold values

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

MLlib:计算多个阈值的精度和召回率 [英] MLlib: Calculating Precision and Recall for multiple threshold values

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭