Spark MultilayerPerceptronClassifier类概率 [英] Spark MultilayerPerceptronClassifier Class Probabilities
问题描述
我是一位经验丰富的Python程序员,试图将一些Python代码过渡到Spark以完成分类任务.这是我第一次在Spark/Scala中工作.
I am an experienced Python programmer trying to transition some Python code to Spark for a classification task. This is my first time working in Spark/Scala.
在Python中,Keras/tensorflow和sci-kit Learn神经网络在多类分类上都做得很好,我能够轻松返回前3个最可能的类以及对该项目至关重要的几率.
In Python, both Keras/tensorflow and sci-kit Learn neural networks do a great job on the multi-class classification and I'm able to easily return the top 3 most probable classes along with probabilities which are key to this project.
我通常已经成功地将代码移至Spark(Scala),并且能够生成正确的预测,但是我无法从MLlib中的MultilayerPerceptronClassifier中找到返回最高级预测类的概率的方法..
I have been generally successful in moving the code to Spark (Scala) and I'm able to generate the correct predictions but I have not been able to find a way to return probabilities for the top predicted classes from the MultilayerPerceptronClassifier in MLlib.
我找到的最接近的解决方案是在这篇文章中:如何获取分类概率来自MultilayerPerceptronClassifier?但是,由于缺少关键代码,或者由于我对Scala(可能是后者)太陌生,无法进行必要的调整,因此我无法在帖子中找到解决方案.
The closest solution I found was in this post: How to get classification probabilities from MultilayerPerceptronClassifier? However, I'm not able to get the solution in the post to work either because it's missing a key piece of code or I'm too new to Scala (probably the latter) to make the needed adjustments.
有人解决了这个问题吗?
Has anyone solved this problem?
这些是我环境中的当前版本.Spark版本:2.1.1Scala版本:2.11.8
These are the current versions in my environment. Spark version: 2.1.1 Scala version: 2.11.8
感谢您的帮助
RKB
推荐答案
如果您仔细查看 MultilayerPerceptronClassificationModel.transform
( model
和测试
(在示例中为定义)官方文档中的管道)
If you carefully take a look at the results of MultilayerPerceptronClassificationModel.transform
(model
and test
as defined in the example pipeline in the official documentation)
val result = model.transform(test)
result.printSchema
root
|-- label: double (nullable = true)
|-- features: vector (nullable = true)
|-- rawPrediction: vector (nullable = true)
|-- probability: vector (nullable = true)
|-- prediction: double (nullable = false)
您会看到它们包含 probability
列.
它存储为 o.a.s.ml.linalg.Vector
列:
result.select($"probability").show(3, false)
+---------------------------------------------------+
|probability |
+---------------------------------------------------+
|[2.630203838780848E-29,1.7323171642231641E-19,1.0] |
|[1.0,1.448487547623119E-121,4.530084532282489E-44] |
|[1.0,5.157808976162274E-122,2.5702890543589884E-44]|
+---------------------------------------------------+
only showing top 3 rows
,并且可以使用标准方法进行访问.
and can be accessed using standard methods.
此功能自Spark 2.3( SPARK-12664曝光概率,即MultilayerPerceptronClassificationModel中的rawPrediction).
This feature is available since Spark 2.3 (SPARK-12664 Expose probability, rawPrediction in MultilayerPerceptronClassificationModel).
这篇关于Spark MultilayerPerceptronClassifier类概率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!