如何获得在spark.mllib分类模型每个实例的概率 [英] How to get the probability per instance in classifications models in spark.mllib

查看:662
本文介绍了如何获得在spark.mllib分类模型每个实例的概率的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用spark.mllib.classification {LogisticRegressionModel,LogisticRegressionWithSGD}和spark.mllib.tree.RandomForest进行分类。使用这些包我产生分类模型。只有这些车型$ P ​​$ pdict每个实例特定类。在Weka中,我们可以得到每个实例是每个类的精确概率。我们能如何使用这些包呢?

I'm using spark.mllib.classification.{LogisticRegressionModel, LogisticRegressionWithSGD} and spark.mllib.tree.RandomForest for classification. Using these packages I produce classification models. Only these models predict a specific class per instance. In Weka, we can get the exact probability for each instance to be of each class. How can we do it using these packages?

在LogisticRegressionModel我们可以设置阈值。所以,我创建了检查结果对不同的阈值的每个点的功能。但是,这不能随机森林进行(见<一href=\"http://stackoverflow.com/questions/30569201/how-to-set-cutoff-while-training-the-data-in-random-forest-in-spark\">How设定截止,而在星火训练数据随机森林)

In LogisticRegressionModel we can set the threshold. So I've created a function that check the results for each point on a different threshold. But this cannot be done for RandomForest (see How to set cutoff while training the data in Random Forest in Spark)

推荐答案

不幸的是,MLLIb你不能让每个实例的概率分类模型到1.4.1版本。

Unfortunately, with MLLIb you can't get the probabilities per instance for classification models till version 1.4.1.

有是JIRA的问题( SPARK-4362 和的 SPARK-6885 )。然而,问题似乎保持自2014年11月

There is JIRA issues (SPARK-4362 and SPARK-6885) concerning this exact topic which is IN PROGRESS as I'm writing the answer now. Nevertheless, the issue seems to be on hold since November 2014

目前还没有办法prediction期间获得与朴素贝叶斯模型一个prediction的后验概率。这应该与标签一起提供。

There is currently no way to get the posterior probability of a prediction with Naive Baye's model during prediction. This should be made available along with the label.

这是@肖恩 - 欧文就有关朴素贝叶斯分类算法类似的主题邮件列表上的注意事项:

And here is a note from @sean-owen on the mailing list on a similar topic regarding the Naive Bayes classification algorithm:

这是最近这个邮件列表上的讨论。你不能概率直接出来了,但你能砍一点得到NaiveBayesModel的内部数据结构,并从那里计算它。

This was recently discussed on this mailing list. You can't get the probabilities out directly now, but you can hack a bit to get the internal data structures of NaiveBayesModel and compute it from there.

参考:

主要修改:这个问题一直解决星火1.5.0。请参考JIRA 问题了解更多详情。

MAJOR EDIT: This issue has been resolved with Spark 1.5.0. Please refer to the JIRA issue for more details.

这篇关于如何获得在spark.mllib分类模型每个实例的概率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆