Spark MultilayerPerceptronClassifier类概率 [英] Spark MultilayerPerceptronClassifier Class Probabilities

查看:64
本文介绍了Spark MultilayerPerceptronClassifier类概率的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是一位经验丰富的Python程序员,试图将一些Python代码过渡到Spark以完成分类任务.这是我第一次在Spark/Scala中工作.

I am an experienced Python programmer trying to transition some Python code to Spark for a classification task. This is my first time working in Spark/Scala.

在Python中,Keras/tensorflow和sci-kit Learn神经网络在多类分类上都做得很好,我能够轻松返回前3个最可能的类以及对该项目至关重要的几率.

In Python, both Keras/tensorflow and sci-kit Learn neural networks do a great job on the multi-class classification and I'm able to easily return the top 3 most probable classes along with probabilities which are key to this project.

我通常已经成功地将代码移至Spark(Scala),并且能够生成正确的预测,但是我无法从MLlib中的MultilayerPerceptronClassifier中找到返回最高级预测类的概率的方法..

I have been generally successful in moving the code to Spark (Scala) and I'm able to generate the correct predictions but I have not been able to find a way to return probabilities for the top predicted classes from the MultilayerPerceptronClassifier in MLlib.

我找到的最接近的解决方案是在这篇文章中:如何获取分类概率来自MultilayerPerceptronClassifier?但是,由于缺少关键代码,或者由于我对Scala(可能是后者)太陌生,无法进行必要的调整,因此我无法在帖子中找到解决方案.

The closest solution I found was in this post: How to get classification probabilities from MultilayerPerceptronClassifier? However, I'm not able to get the solution in the post to work either because it's missing a key piece of code or I'm too new to Scala (probably the latter) to make the needed adjustments.

有人解决了这个问题吗?

Has anyone solved this problem?

这些是我环境中的当前版本.Spark版本:2.1.1Scala版本:2.11.8

These are the current versions in my environment. Spark version: 2.1.1 Scala version: 2.11.8

感谢您的帮助

RKB

推荐答案

如果您仔细查看 MultilayerPerceptronClassificationModel.transform ( model 测试(在示例中为定义)官方文档中的管道)

If you carefully take a look at the results of MultilayerPerceptronClassificationModel.transform (model and test as defined in the example pipeline in the official documentation)

val result = model.transform(test)

result.printSchema

root
 |-- label: double (nullable = true)
 |-- features: vector (nullable = true)
 |-- rawPrediction: vector (nullable = true)
 |-- probability: vector (nullable = true)
 |-- prediction: double (nullable = false)

您会看到它们包含 probability 列.

它存储为 o.a.s.ml.linalg.Vector 列:

result.select($"probability").show(3, false)

+---------------------------------------------------+
|probability                                        |
+---------------------------------------------------+
|[2.630203838780848E-29,1.7323171642231641E-19,1.0] |
|[1.0,1.448487547623119E-121,4.530084532282489E-44] |
|[1.0,5.157808976162274E-122,2.5702890543589884E-44]|
+---------------------------------------------------+
only showing top 3 rows

,并且可以使用标准方法进行访问.

and can be accessed using standard methods.

此功能自Spark 2.3( SPARK-12664曝光概率,即MultilayerPerceptronClassificationModel中的rawPrediction).

This feature is available since Spark 2.3 (SPARK-12664 Expose probability, rawPrediction in MultilayerPerceptronClassificationModel).

这篇关于Spark MultilayerPerceptronClassifier类概率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆