Scala:如何知道哪个概率对应哪个类? [英] Scala: how to know which probability correspond to which class?

查看:51
本文介绍了Scala:如何知道哪个概率对应哪个类?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我创建了一个分类器随机森林来预测某事.标签为是"(=1.0) 或否"(=0.0)

I create a classifier random forest to predict something. The label is either "yes" (=1.0) or "no" (=0.0)

我将我的模型应用于测试.这是我的代码和 20 行的结果:

I apply my model on a test. Here is my code and my result for 20 lines:

import org.apache.spark.ml.tuning.CrossValidatorModel
import org.apache.spark.sql.types._
import org.apache.spark.sql._
import org.apache.spark.sql.functions.udf
import org.apache.spark.sql.functions._

var modelrf = CrossValidatorModel.load("modelSupervise/newModel")
var test = spark.sql("""select * from dc.newTest""")

var predictions = modelrf.transform(test)

predictions.select("id","label","rawPrediction","probability","prediction").show(20,false)


+--------+--------------+----------------------------------------+-----------------------------------------+----------+
|id      |label         |rawPrediction                           |probability                              |prediction|
+--------+--------------+----------------------------------------+-----------------------------------------+----------+
|1       |0             |[18.954508743604,1.0454912563959982]    |[0.9477254371802001,0.05227456281979992] |0.0       |
|2       |0             |[19.396893651115214,0.6031063488847838] |[0.9698446825557608,0.030155317444239195]|0.0       |
|3       |0             |[19.562942473138747,0.4370575268612524] |[0.9781471236569373,0.02185287634306262] |0.0       |
|4       |0             |[19.072030495384865,0.9279695046151306] |[0.9536015247692434,0.04639847523075654] |0.0       |
|5       |0             |[19.43338228765314,0.5666177123468583]  |[0.9716691143826571,0.02833088561734292] |0.0       |
|6       |0             |[19.696154641398266,0.3038453586017339] |[0.9848077320699133,0.015192267930086694]|0.0       |
|7       |0             |[19.561887703818552,0.4381122961814507] |[0.9780943851909274,0.02190561480907253] |0.0       |
|8       |0             |[19.670868420870097,0.32913157912990343]|[0.9835434210435048,0.01645657895649517] |0.0       |
|9       |0             |[19.31258444658832,0.6874155534116762]  |[0.9656292223294163,0.034370777670583816]|0.0       |
|10      |1             |[19.324118365007614,0.6758816349923846] |[0.9662059182503807,0.03379408174961923] |0.0       |
|11      |0             |[19.671923190190295,0.32807680980970505]|[0.9835961595095147,0.016403840490485253]|0.0       |
|12      |0             |[5.549867107480572,14.450132892519427]  |[0.2774933553740286,0.7225066446259714]  |1.0       |
|13      |0             |[8.302734500577003,11.697265499422995]  |[0.41513672502885013,0.5848632749711498] |1.0       |
|14      |0             |[3.719926021010336,16.280073978989666]  |[0.1859963010505168,0.8140036989494831]  |1.0       |
|15      |1             |[4.9810130629790486,15.018986937020955] |[0.2490506531489524,0.7509493468510476]  |1.0       |
|16      |1             |[7.575144612227263,12.424855387772734]  |[0.37875723061136324,0.6212427693886368] |1.0       |
|17      |0             |[9.763210063340546,10.236789936659454]  |[0.4881605031670273,0.5118394968329727]  |1.0       |
|18      |0             |[9.475787091640768,10.524212908359234]  |[0.4737893545820384,0.5262106454179617]  |1.0       |
|19      |1             |[4.236097613170449,15.763902386829551]  |[0.21180488065852243,0.7881951193414776] |1.0       |
|20      |0             |[8.748700591583557,11.251299408416445]  |[0.43743502957917785,0.5625649704208222] |1.0       |
|21      |0             |[8.908800090849974,11.091199909150026]  |[0.4454400045424987,0.5545599954575013]  |1.0       |
|22      |1             |[9.726530070446398,10.273469929553602]  |[0.4863265035223199,0.5136734964776801]  |1.0       |
|23      |1             |[8.908800090849974,11.091199909150026]  |[0.4454400045424987,0.5545599954575013]  |1.0       |
+--------+--------------+----------------------------------------+-----------------------------------------+----------+

以下是我的理解:

对于 id=1.18.95 棵树预测值为0.0",1.045 棵树预测值值1.1".我认为 Scala 对向量的值进行排序"rawPrediction" 关于类的价值 --> 首先考虑班级0"和第二个表示班级1".

for id=1. 18.95 trees predict the value "0.0" and 1.045 trees predict the value "1.1". I thought that scala order the values of the vector "rawPrediction" regaring the value of the class --> first regard the class "0" and the second one regard the class "1".

但如果这是真的,如果我们有是"或否"而不是 0 或 1,scala 会给出什么顺序?按字母顺序?

But if it were true and if we had "yes" or "no" instead of 0 or 1, what order would scala give? Alphabetical order?

我做了一些研究,发现了这个问题:随机森林分类器:概率对应于哪个类别

I made some research and I find this question: Random Forest Classifier :To which class corresponds the probabilities

问题是相同的,但对于向量概率".向量中哪个元素对应预测0"的概率,哪个元素对应预测1"的概率?

The question is the same but for the vector "probability". Which element of the vector correspond to the probability to predict "0" and which element correspond to the probability to predict "1"?

我不明白答案...

如何知道,对于每一行,模型预测是"(或 1)的概率是多少?关于标签的类型,scala 是按数字还是按字母顺序排列概率......?

How to know, for each line, what is the probability for the model to predict "yes" (or 1)? Does scala order probabilities numericaly or alphabeticaly regarding the type of the label...?

提前谢谢你!!

推荐答案

答案在这里!!!在我的问题中,我加载了一个模型.

Here is the answer!!! In my question I load a model.

但答案在此之前.

为了适应模型,我在目标上使用了 labelIndexer.该标签索引器通过降频将目标转化为索引.

To fit the model I use a labelIndexer on my target. This label indexer transform the target into an indexe by descending frequency.

例如:如果在我的目标中我有 20% 的aa"和 80% 的bb"标签索引器将创建一个列标签",其中bb"的值为 0,aa"的值为 1(因为我bb"比aa"更频繁)

ex: if, in my target I have 20% of "aa" and 80% of "bb" label indexer will create a column "label" that took the value 0 for "bb" and 1 for "aa" (because I "bb" is ore frequent than "aa")

当我们拟合随机森林时,概率对应于频率的顺序.

When we fit a random forest, the probabilities correspond to the order of frequency.

在二元分类中:

  • first proba = 该类是训练集中出现频率最高的类的概率
  • second proba = 该类是训练集中频率较低的类的概率

这篇关于Scala:如何知道哪个概率对应哪个类?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆