如何解释火花逻辑回归预测中的概率列? [英] How to interpret probability column in spark logistic regression prediction?

查看:24
本文介绍了如何解释火花逻辑回归预测中的概率列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我通过 spark.ml.classification.LogisticRegressionModel.predict 获得预测.许多行将 prediction 列设为 1.0,将 probability 列设为 .04.model.getThreshold0.5 所以我假设模型将超过 0.5 概率阈值的所有内容分类为 1.0.

I'm getting predictions through spark.ml.classification.LogisticRegressionModel.predict. A number of the rows have the prediction column as 1.0 and probability column as .04. The model.getThreshold is 0.5 so I'd assume the model is classifying everything over a 0.5 probability threshold as 1.0.

我应该如何解释具有 1.0 predictionprobability 0.04 的结果?

How am I supposed to interpret a result with a 1.0 prediction and a probability of 0.04?

推荐答案

执行 LogisticRegression 的概率列应该包含一个长度与类数相同的列表,其中每个索引给出该类别的对应概率.我用两个类做了一个小例子来说明:

The probability column from performing a LogisticRegression should contain a list with the same length as the number of classes, where each index gives the corresponding probability for that class. I made a small example with two classes for illustration:

case class Person(label: Double, age: Double, height: Double, weight: Double)
val df = List(Person(0.0, 15, 175, 67), 
      Person(0.0, 30, 190, 100), 
      Person(1.0, 40, 155, 57), 
      Person(1.0, 50, 160, 56), 
      Person(0.0, 15, 170, 56), 
      Person(1.0, 80, 180, 88)).toDF()

val assembler = new VectorAssembler().setInputCols(Array("age", "height", "weight"))
  .setOutputCol("features")
  .select("label", "features")
val df2 = assembler.transform(df)
df2.show

+-----+------------------+
|label|          features|
+-----+------------------+
|  0.0| [15.0,175.0,67.0]|
|  0.0|[30.0,190.0,100.0]|
|  1.0| [40.0,155.0,57.0]|
|  1.0| [50.0,160.0,56.0]|
|  0.0| [15.0,170.0,56.0]|
|  1.0| [80.0,180.0,88.0]|
+-----+------------------+

val lr = new LogisticRegression().setMaxIter(10).setRegParam(0.3).setElasticNetParam(0.8)
val Array(testing, training) = df2.randomSplit(Array(0.7, 0.3))

val model = lr.fit(training)
val predictions = model.transform(testing)
predictions.select("probability", "prediction").show(false)


+----------------------------------------+----------+
|probability                             |prediction|
+----------------------------------------+----------+
|[0.7487950501224138,0.2512049498775863] |0.0       |
|[0.6458452667523259,0.35415473324767416]|0.0       |
|[0.3888393314864866,0.6111606685135134] |1.0       |
+----------------------------------------+----------+

这里是概率以及算法做出的最终预测.最终概率最高的类别就是预测的类别.

Here are the probabilities as well as the final prediction made by the algorithm. The class that have the highest probability in the end is the one predicted.

这篇关于如何解释火花逻辑回归预测中的概率列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆