星火:回归模型阈值和precision [英] Spark : regression model threshold and precision
问题描述
我有回归模式,在这里我明确的门槛设定为0.5。
I have logistic regression mode, where I explicitly set the threshold to 0.5.
model.setThreshold(0.5)
我训练模型,然后我想要得到的基本统计资料 - precision,召回等
I train the model and then I want to get basic stats -- precision, recall etc.
这是当我评估模型我做什么:
This is what I do when I evaluate the model:
val metrics = new BinaryClassificationMetrics(predictionAndLabels)
val precision = metrics.precisionByThreshold
precision.foreach { case (t, p) =>
println(s"Threshold is: $t, Precision is: $p")
}
我得到的只有0.0和1.0作为阈值和0.5完全被忽略的结果。
I get results with only 0.0 and 1.0 as values of threshold and 0.5 is completely ignored.
下面是上述回路的输出:
Here is the output of the above loop:
阈值是1.0,precision是:0.8571428571428571
Threshold is: 1.0, Precision is: 0.8571428571428571
阈值是:0.0,precision是:0.3005181347150259
Threshold is: 0.0, Precision is: 0.3005181347150259
当我打电话metrics.thresholds()也返回只有两个值,0.0和1.0。
When I call metrics.thresholds() it also returns only two values, 0.0 and 1.0.
我如何与阈值precision和召回值0.5?
How do I get the precision and recall values with threshold as 0.5?
推荐答案
您需要清除模型门槛你让predictions之前。清除门槛使你的predictions返回一个分值,而不是分类标签。如果没有,你只会有两个阈值,即您的标签0.0和1.0。
You need to clear the model threshold before you make predictions. Clearing threshold makes your predictions return a score and not the classified label. If not you will only have two thresholds, i.e. your labels 0.0 and 1.0.
model.clearThreshold()
从predictionsAndLabels元组应该看起来像(0.6753421,1.0)
,而不是(1.0,1.0)
看看<一个href=\"https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/mllib/BinaryClassificationMetricsExample.scala\" rel=\"nofollow\">https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/mllib/BinaryClassificationMetricsExample.scala
您可能仍然要设置numBins控制点的数量,如果输入的是很大的。
You probably still want to set numBins to control the number of points if the input is large.
这篇关于星火:回归模型阈值和precision的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!