Spark:回归模型阈值和精度 [英] Spark : regression model threshold and precision
问题描述
我有逻辑回归模式,我明确地将阈值设置为 0.5.
I have logistic regression mode, where I explicitly set the threshold to 0.5.
model.setThreshold(0.5)
我训练模型,然后我想获得基本的统计数据——准确率、召回率等.
I train the model and then I want to get basic stats -- precision, recall etc.
这是我在评估模型时所做的:
This is what I do when I evaluate the model:
val metrics = new BinaryClassificationMetrics(predictionAndLabels)
val precision = metrics.precisionByThreshold
precision.foreach { case (t, p) =>
println(s"Threshold is: $t, Precision is: $p")
}
我得到的结果只有 0.0 和 1.0 作为阈值,而 0.5 被完全忽略.
I get results with only 0.0 and 1.0 as values of threshold and 0.5 is completely ignored.
这是上面循环的输出:
阈值为:1.0,精度为:0.8571428571428571
Threshold is: 1.0, Precision is: 0.8571428571428571
阈值为:0.0,精度为:0.3005181347150259
Threshold is: 0.0, Precision is: 0.3005181347150259
当我调用 metrics.thresholds() 时,它也只返回两个值,0.0 和 1.0.
When I call metrics.thresholds() it also returns only two values, 0.0 and 1.0.
如何获得阈值为 0.5 的精度和召回值?
How do I get the precision and recall values with threshold as 0.5?
推荐答案
在进行预测之前,您需要清除模型阈值.清除阈值使您的预测返回分数而不是分类标签.如果不是,您将只有两个阈值,即您的标签 0.0 和 1.0.
You need to clear the model threshold before you make predictions. Clearing threshold makes your predictions return a score and not the classified label. If not you will only have two thresholds, i.e. your labels 0.0 and 1.0.
model.clearThreshold()
来自 predictionsAndLabels 的元组应该看起来像 (0.6753421,1.0)
而不是 (1.0,1.0)
A tuple from predictionsAndLabels should look like (0.6753421,1.0)
and not (1.0,1.0)
如果输入很大,您可能仍然希望设置 numBins 来控制点数.
You probably still want to set numBins to control the number of points if the input is large.
这篇关于Spark:回归模型阈值和精度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!