在星火RandomForestClassifier predict类的概率 [英] Predict Class Probabilities in Spark RandomForestClassifier

查看:1981
本文介绍了在星火RandomForestClassifier predict类的概率的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我建立使用ml.classification.RandomForestClassifier随机森林模型。我试图从模型中提取predict概率,但我只看到prediction类,而不是概率。根据这一问题链接,该问题得到解决,它会导致这个<一个HREF =htt​​ps://github.com/apache/spark/pull/7432相对=nofollow> github上拉申请和的这个。然而,现在看来,这是在1.5版解决。我使用的是AWS EMR提供星火1.4.1和窗台不知道如何获得predict概率。如果有人知道如何做到这一点,请分享你的想法和解决方案。谢谢!

I built random forest models using ml.classification.RandomForestClassifier. I am trying to extract the predict probabilities from the models but I only saw prediction classes instead of the probabilities. According to this issue link, the issue is resolved and it leads to this github pull request and this. However, It seems it's resolved in the version 1.5. I'm using the AWS EMR which provides Spark 1.4.1 and sill have no idea how to get the predict probabilities. If anyone knows how to do it, please share your thought or solutions. Thanks!

推荐答案

我已经回答了类似的<一个href=\"http://stackoverflow.com/questions/31231514/how-to-get-the-probability-per-instance-in-classifications-models-in-spark-mllib\">question之前。

I have already answered a similar question before.

不幸的是,MLLIb你不能让每个实例的概率分类模型到1.4.1版本。

Unfortunately, with MLLIb you can't get the probabilities per instance for classification models till version 1.4.1.

有是JIRA的问题( SPARK-4362 和的 SPARK-6885 )。然而,问题似乎保持自2014年11月

There is JIRA issues (SPARK-4362 and SPARK-6885) concerning this exact topic which is IN PROGRESS as I'm writing the answer now. Nevertheless, the issue seems to be on hold since November 2014

目前还没有办法prediction期间获得与朴素贝叶斯模型一个prediction的后验概率。这应该与标签一起提供。

There is currently no way to get the posterior probability of a prediction with Naive Baye's model during prediction. This should be made available along with the label.

这是@肖恩 - 欧文就有关朴素贝叶斯分类算法类似的主题邮件列表上的注意事项:

And here is a note from @sean-owen on the mailing list on a similar topic regarding the Naive Bayes classification algorithm:

这是最近这个邮件列表上的讨论。你不能概率直接出来了,但你能砍一点得到NaiveBayesModel的内部数据结构,并从那里计算它。

This was recently discussed on this mailing list. You can't get the probabilities out directly now, but you can hack a bit to get the internal data structures of NaiveBayesModel and compute it from there.

参考:

此问题已解决星火1.5.0。请参考JIRA 问题了解更多详情。

This issue has been resolved with Spark 1.5.0. Please refer to the JIRA issue for more details.

关于AWS ,有没有什么可以为现在要做的。一个解决方案可能是,如果你可以用叉子叉火花EMR自举,动作,并为你的需要,那么你就可以使用引导步AWS安装星火配置。

Concerning AWS, there is not much you can do now for that. A solution might be if you can fork the emr-bootstrap-actions for spark and configure it for you needs, then you'll be able to install Spark on AWS using the bootstrap step.

不过,这似乎有点复杂。

Nevertheless, this might seem a little complicated.

有就是你可能需要考虑一些事情:

There is some thing you might need to consider :


  • 更​​新火花/ config.file 来安装你火花1.5。是这样的:

  • update the spark/config.file to install you spark-1.5. Something like :

+3  1.5.0   python  s3://support.elasticmapreduce/spark/install-spark-script.py s3://path.to.your.bucket.spark.installation/spark/1.5.0/spark-1.5.0.tgz


  • 上面这个文件列表必须位于你自己暂时的规定S3斗火花的正确版本。

  • this file list above must be a proper build of spark located in an specified s3 bucket you own for the time being.

    要建立你的火花,我建议你阅读关于它的例子<一个href=\"https://github.com/awslabs/emr-bootstrap-actions/blob/master/spark/examples/building-spark-for-emr.md\"相对=nofollow>有关部分建筑火花换EMR ,也是的官方文档。这应该是吧! (我希望我没有忘记任何东西)

    To build your spark, I advice you reading about it in the examples section about building-spark-for-emr and also the official documentation. That should be about it! (I hope I haven't forgotten anything)

    编辑:亚马逊发布EMR提供4.1.0星火阿帕奇(1.5.0)的升级版本。您可以检查 了解更多详情

    EDIT : Amazon EMR release 4.1.0 offers an upgraded version of Apache Spark (1.5.0). You can check here for more details.

    这篇关于在星火RandomForestClassifier predict类的概率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

  • 查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆