随机森林分类器:对应于哪个类别的概率 [英] Random Forest Classifier :To which class corresponds the probabilities

查看:647
本文介绍了随机森林分类器:对应于哪个类别的概率的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用pyspark.ml.classification

我在二进制类数据集上运行模型并显示概率.

I run the model on a binary class dataset and display the probabilities.

我有以下几种可能性:

+-----+----------+---------------------------------------+
|label|prediction|probability                            |
+-----+----------+---------------------------------------+
|0.0  |0.0       |[0.9005918461098429,0.0994081538901571]|
|1.0  |1.0       |[0.6051335859900139,0.3948664140099861]|
+-----+----------+---------------------------------------+

我有2个元素的列表,这些元素显然与预测类的概率相对应.

I have a list of 2 elements which obviously correspond to the probabilities of the predicted class.

我的问题 :概率[0始终对应于预测值,而在spark文档中还不清楚!

My question : probability[0 corresponds always to the value of prediction whereas in the spark documentation it is not clear!

推荐答案

我将您的问题解释为询问:预测"列下的数组中的第一个元素是否始终与预测的类"相对应?表示随机森林分类器预测的观测值应具有的标签.

I am interpreting your question as asking: does the first element in the array under the column 'predictions' always correspond to the "predicted class", by which you mean the label the Random Forest Classifier predicted the observation should have.

如果我说的没错,答案是肯定的.

If I have that correct, the answer is Yes.

probability行中的数组中的项目都可以在模型告诉您的时候读取:

The items in the arrays in both probability rows can be read as the model telling you:

[

如果要预测多个标签,则模型将告诉您:

In the case of multiple labels being predicted, then you would have the model telling you:

['My confidence that the label I predict = specific label 1', 'My confidence that the label I predict = specific label 2', ...'My confidence that the label I predict = specific label N']

['My confidence that the label I predict = specific label 1', 'My confidence that the label I predict = specific label 2', ...'My confidence that the label I predict = specific label N']

这由您要预测的N个标签索引(这意味着您必须小心标签的结构方式).

This is indexed by the N labels you are trying to predict (which means you have to be careful about the way the labels are structured).

也许这有助于查看

Perhaps it would help to take a look at this answer. You could do something like:

model = pipeline.fit(trainig_data) predictions = model.transform(test_data) print predictions.show(10)

model = pipeline.fit(trainig_data) predictions = model.transform(test_data) print predictions.show(10)

(使用示例中的相关管道和数据.)

(Using the relevant pipeline and data from your examples.)

这将向您显示每个班级的概率.

This will show you the probabilities for each class.

这篇关于随机森林分类器:对应于哪个类别的概率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆