Spark MLlib中DataFrame的“rawPrediction"和“probability"列是什么意思? [英] What do columns ‘rawPrediction’ and ‘probability’ of DataFrame mean in Spark MLlib？

查看：77 发布时间：2021/11/14 22:16:52 apache-spark-sql logistic-regression apache-spark-ml

本文介绍了Spark MLlib中DataFrame的“rawPrediction"和“probability"列是什么意思?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

After I trained a LogisticRegressionModel, I transformed the test data DF with it and get the prediction DF. And then when I call prediction.show(), the output column names are: [label | features | rawPrediction | probability | prediction]. I know what label and featrues mean, but how should I understand rawPrediction|probability|prediction?

推荐答案

注意:另请参阅desertnaut https://stackoverflow.com/a/52947815/1056563

Note: please also see the answer below by desertnaut https://stackoverflow.com/a/52947815/1056563

RawPrediction 通常是直接概率/置信度计算.来自 Spark 文档:

RawPrediction is typically the direct probability/confidence calculation. From Spark docs:

每个可能标签的原始预测.生"的意思预测可能因算法而异，但它直观地给出了对每个可能标签的置信度度量(其中较大 = 更多自信).

Raw prediction for each possible label. The meaning of a "raw" prediction may vary between algorithms, but it intuitively gives a measure of confidence in each possible label (where larger = more confident).

Prediction 是找到 rawPrediction 的统计 mode 的结果 - viaargmax`:

The Prediction is the result of finding the statistical mode of the rawPrediction - viaargmax`:

  protected def raw2prediction(rawPrediction: Vector): Double =
          rawPrediction.argmax

Probability 是每个类的条件概率.这是scaladoc:

给定原始预测，估计每个类别的概率，
就地进行计算.这些预测也称为类条件概率.

Estimate the probability of each class given the raw prediction,
doing the computation in-place. These predictions are also called class conditional probabilities.

实际计算取决于您使用的Classifier.

The actual calculation depends on which Classifier you are using.

决策树

将原始预测向量标准化为多项式概率向量，就位.

Normalize a vector of raw predictions to be a multinomial probability vector, in place.

它只是按类对实例求和，然后除以总实例数.

It simply sums by class across the instances and then divides by the total instance count.

 class_k probability = Count_k/Count_Total

LogisticRegression

它使用逻辑公式

 class_k probability: 1/(1 + exp(-rawPrediction_k))

朴素贝叶斯

 class_k probability = exp(max(rawPrediction) - rawPrediction_k)

随机森林

 class_k probability = Count_k/Count_Total

这篇关于Spark MLlib中DataFrame的“rawPrediction"和“probability"列是什么意思?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Spark MLlib中DataFrame的“rawPrediction"和“probability"列是什么意思? [英] What do columns ‘rawPrediction’ and ‘probability’ of DataFrame mean in Spark MLlib？

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Spark MLlib中DataFrame的“rawPrediction"和“probability"列是什么意思? [英] What do columns ‘rawPrediction’ and ‘probability’ of DataFrame mean in Spark MLlib？

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭