Spark:回归模型阈值和精度 [英] Spark : regression model threshold and precision

查看:52
本文介绍了Spark:回归模型阈值和精度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有逻辑回归模式,我明确地将阈值设置为 0.5.

I have logistic regression mode, where I explicitly set the threshold to 0.5.

model.setThreshold(0.5)

我训练模型,然后我想获得基本的统计数据——准确率、召回率等.

I train the model and then I want to get basic stats -- precision, recall etc.

这是我在评估模型时所做的:

This is what I do when I evaluate the model:

val metrics = new BinaryClassificationMetrics(predictionAndLabels)

val precision = metrics.precisionByThreshold


precision.foreach { case (t, p) =>

      println(s"Threshold is: $t, Precision is: $p")

    }

我得到的结果只有 0.0 和 1.0 作为阈值,而 0.5 被完全忽略.

I get results with only 0.0 and 1.0 as values of threshold and 0.5 is completely ignored.

这是上面循环的输出:

阈值为:1.0,精度为:0.8571428571428571

Threshold is: 1.0, Precision is: 0.8571428571428571

阈值为:0.0,精度为:0.3005181347150259

Threshold is: 0.0, Precision is: 0.3005181347150259

当我调用 metrics.thresholds() 时,它也只返回两个值,0.0 和 1.0.

When I call metrics.thresholds() it also returns only two values, 0.0 and 1.0.

如何获得阈值为 0.5 的精度和召回值?

How do I get the precision and recall values with threshold as 0.5?

推荐答案

在进行预测之前,您需要清除模型阈值.清除阈值使您的预测返回分数而不是分类标签.如果不是,您将只有两个阈值,即您的标签 0.0 和 1.0.

You need to clear the model threshold before you make predictions. Clearing threshold makes your predictions return a score and not the classified label. If not you will only have two thresholds, i.e. your labels 0.0 and 1.0.

model.clearThreshold()

来自 predictionsAndLabels 的元组应该看起来像 (0.6753421,1.0) 而不是 (1.0,1.0)

A tuple from predictionsAndLabels should look like (0.6753421,1.0) and not (1.0,1.0)

看看https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/mllib/BinaryClassificationMetricsExample.scala

如果输入很大,您可能仍然希望设置 numBins 来控制点数.

You probably still want to set numBins to control the number of points if the input is large.

这篇关于Spark:回归模型阈值和精度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆