在Spark MLlib中，DataFrame的"rawPrediction"和"probability"列是什么意思? [英] What do columns ‘rawPrediction’ and ‘probability’ of DataFrame mean in Spark MLlib？

查看：1524 发布时间：2020/5/4 3:17:36 apache-spark-sql logistic-regression apache-spark-ml

本文介绍了在Spark MLlib中，DataFrame的"rawPrediction"和"probability"列是什么意思?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

After I trained a LogisticRegressionModel, I transformed the test data DF with it and get the prediction DF. And then when I call prediction.show(), the output column names are: [label | features | rawPrediction | probability | prediction]. I know what label and featrues mean, but how should I understand rawPrediction|probability|prediction?

推荐答案

RawPrediction通常是直接概率/置信度计算.来自Spark文档:

RawPrediction is typically the direct probability/confidence calculation. From Spark docs:

每个可能标签的原始预测. 原始"的意思预测在算法之间可能会有所不同，但是直观地给出了对每个可能标签的置信度的度量(较大=更多有信心).

Raw prediction for each possible label. The meaning of a "raw" prediction may vary between algorithms, but it intuitively gives a measure of confidence in each possible label (where larger = more confident).

Prediction是找到rawPrediction - via argmax`的统计mode的结果:

The Prediction is the result of finding the statistical mode of the rawPrediction - viaargmax`:

  protected def raw2prediction(rawPrediction: Vector): Double =
          rawPrediction.argmax

Probability是每个类的conditional probability.这是scaladoc:

在给出原始预测的情况下估算每个类别的概率，
就地进行计算.这些预测也称为类条件概率.

Estimate the probability of each class given the raw prediction,
doing the computation in-place. These predictions are also called class conditional probabilities.

实际计算取决于您所使用的Classifier.

The actual calculation depends on which Classifier you are using.

DecisionTree

将原始预测向量归一化为多项式概率向量.

Normalize a vector of raw predictions to be a multinomial probability vector, in place.

它只是简单地按实例对类进行求和，然后除以实例总数.

It simply sums by class across the instances and then divides by the total instance count.

 class_k probability = Count_k/Count_Total

LogisticRegression

它使用逻辑公式

 class_k probability: 1/(1 + exp(-rawPrediction_k))

Naive Bayes

 class_k probability = exp(max(rawPrediction) - rawPrediction_k)

Random Forest

 class_k probability = Count_k/Count_Total

这篇关于在Spark MLlib中，DataFrame的"rawPrediction"和"probability"列是什么意思?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在Spark MLlib中，DataFrame的"rawPrediction"和"probability"列是什么意思? [英] What do columns ‘rawPrediction’ and ‘probability’ of DataFrame mean in Spark MLlib？

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在Spark MLlib中，DataFrame的"rawPrediction"和"probability"列是什么意思? [英] What do columns ‘rawPrediction’ and ‘probability’ of DataFrame mean in Spark MLlib？

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭