是否可以使用带有 spark mllib 的 GradientBoostedTrees 获得类概率? [英] Is it possible to obtain class probabilities using GradientBoostedTrees with spark mllib?

查看:24
本文介绍了是否可以使用带有 spark mllib 的 GradientBoostedTrees 获得类概率?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在使用 spark mllib.

I am currently working with spark mllib.

我使用 GradientBoosting 算法和 GradientBoostedTrees 类创建了一个文本分类器:

I have created a text classifier using the Gradient Boosting algorithm with the class GradientBoostedTrees:

梯度提升树

目前我获得了知道新元素类别的预测,但我想获得类别概率(硬决策之前的输出值).

Currently I obtain the predictions to know the class of new elements but I would like to obtain the class probabilities (the value of the output before the hard decision).

在逻辑回归等其他 mllib 算法中,您可以从分类器中删除阈值以获得类概率,但我找不到使用 GradientBosstedTrees 执行相同过程的方法.

In other mllib algorithms like logistic regression you can remove the threshold from the classifier to obtain the class probabilities but I can not find a way to do the same procedure with GradientBosstedTrees.

推荐答案

Spark MLLIB 中似乎无法获取类概率.

It seems that in Spark MLLIB it is not possible to obtain the class probabilities.

您只能获得最终的分类决定.

You can only obtain the final classification decision.

很遗憾,因为这些信息将非常有用(如果您将一个样本分类为阳性,并且 99.99% 的可能性与 51% 不同),并且一旦模型经过训练,获得该信息并不困难.

That's a pity because that information would be very useful (If you classify a sample as positive with 99.99% of posibilities is not the same than 51%) and it is not difficult to obtain that information once the model has been trained.

另一种方法是使用不同的软件,如 xgboost:https://github.com/dmlc/xgboost

An alternative is using a different software like xgboost: https://github.com/dmlc/xgboost

这篇关于是否可以使用带有 spark mllib 的 GradientBoostedTrees 获得类概率?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆