Scala:如何知道哪种概率对应于哪一类? [英] Scala: how to know which probability correspond to which class?

查看:77
本文介绍了Scala:如何知道哪种概率对应于哪一类?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我创建了一个分类器随机森林来预测某些内容. 标签为是"(= 1.0)或否"(= 0.0)

I create a classifier random forest to predict something. The label is either "yes" (=1.0) or "no" (=0.0)

我将模型应用于测试.这是我的代码和20行的结果:

I apply my model on a test. Here is my code and my result for 20 lines:

import org.apache.spark.ml.tuning.CrossValidatorModel
import org.apache.spark.sql.types._
import org.apache.spark.sql._
import org.apache.spark.sql.functions.udf
import org.apache.spark.sql.functions._

var modelrf = CrossValidatorModel.load("modelSupervise/newModel")
var test = spark.sql("""select * from dc.newTest""")

var predictions = modelrf.transform(test)

predictions.select("id","label","rawPrediction","probability","prediction").show(20,false)


+--------+--------------+----------------------------------------+-----------------------------------------+----------+
|id      |label         |rawPrediction                           |probability                              |prediction|
+--------+--------------+----------------------------------------+-----------------------------------------+----------+
|1       |0             |[18.954508743604,1.0454912563959982]    |[0.9477254371802001,0.05227456281979992] |0.0       |
|2       |0             |[19.396893651115214,0.6031063488847838] |[0.9698446825557608,0.030155317444239195]|0.0       |
|3       |0             |[19.562942473138747,0.4370575268612524] |[0.9781471236569373,0.02185287634306262] |0.0       |
|4       |0             |[19.072030495384865,0.9279695046151306] |[0.9536015247692434,0.04639847523075654] |0.0       |
|5       |0             |[19.43338228765314,0.5666177123468583]  |[0.9716691143826571,0.02833088561734292] |0.0       |
|6       |0             |[19.696154641398266,0.3038453586017339] |[0.9848077320699133,0.015192267930086694]|0.0       |
|7       |0             |[19.561887703818552,0.4381122961814507] |[0.9780943851909274,0.02190561480907253] |0.0       |
|8       |0             |[19.670868420870097,0.32913157912990343]|[0.9835434210435048,0.01645657895649517] |0.0       |
|9       |0             |[19.31258444658832,0.6874155534116762]  |[0.9656292223294163,0.034370777670583816]|0.0       |
|10      |1             |[19.324118365007614,0.6758816349923846] |[0.9662059182503807,0.03379408174961923] |0.0       |
|11      |0             |[19.671923190190295,0.32807680980970505]|[0.9835961595095147,0.016403840490485253]|0.0       |
|12      |0             |[5.549867107480572,14.450132892519427]  |[0.2774933553740286,0.7225066446259714]  |1.0       |
|13      |0             |[8.302734500577003,11.697265499422995]  |[0.41513672502885013,0.5848632749711498] |1.0       |
|14      |0             |[3.719926021010336,16.280073978989666]  |[0.1859963010505168,0.8140036989494831]  |1.0       |
|15      |1             |[4.9810130629790486,15.018986937020955] |[0.2490506531489524,0.7509493468510476]  |1.0       |
|16      |1             |[7.575144612227263,12.424855387772734]  |[0.37875723061136324,0.6212427693886368] |1.0       |
|17      |0             |[9.763210063340546,10.236789936659454]  |[0.4881605031670273,0.5118394968329727]  |1.0       |
|18      |0             |[9.475787091640768,10.524212908359234]  |[0.4737893545820384,0.5262106454179617]  |1.0       |
|19      |1             |[4.236097613170449,15.763902386829551]  |[0.21180488065852243,0.7881951193414776] |1.0       |
|20      |0             |[8.748700591583557,11.251299408416445]  |[0.43743502957917785,0.5625649704208222] |1.0       |
|21      |0             |[8.908800090849974,11.091199909150026]  |[0.4454400045424987,0.5545599954575013]  |1.0       |
|22      |1             |[9.726530070446398,10.273469929553602]  |[0.4863265035223199,0.5136734964776801]  |1.0       |
|23      |1             |[8.908800090849974,11.091199909150026]  |[0.4454400045424987,0.5545599954575013]  |1.0       |
+--------+--------------+----------------------------------------+-----------------------------------------+----------+

这是我首先了解的内容:

Here is what I understand first:

id = 1. 18.95树预测值"0.0",而1.045树预测值 值"1.1".我认为scala对向量的值进行排序 "rawPrediction"重载了类的价值->首先考虑 类别"0",第二个类别为"1".

for id=1. 18.95 trees predict the value "0.0" and 1.045 trees predict the value "1.1". I thought that scala order the values of the vector "rawPrediction" regaring the value of the class --> first regard the class "0" and the second one regard the class "1".

但是,如果这是真的,并且如果我们用是"或否"而不是0或1,那么scala将给出什么顺序?字母顺序?

But if it were true and if we had "yes" or "no" instead of 0 or 1, what order would scala give? Alphabetical order?

我做了一些研究,发现了这个问题: 随机森林分类器:对应于哪个类的概率

I made some research and I find this question: Random Forest Classifier :To which class corresponds the probabilities

问题是相同的,只是向量为概率".向量的哪个元素对应于预测"0"的概率?哪个元素对应于预测"1"的概率?

The question is the same but for the vector "probability". Which element of the vector correspond to the probability to predict "0" and which element correspond to the probability to predict "1"?

我不明白答案...

如何知道每一行模型预测是"(或1)的概率是多少? scala对标签类型的概率排序是按数字还是按字母顺序...?

How to know, for each line, what is the probability for the model to predict "yes" (or 1)? Does scala order probabilities numericaly or alphabeticaly regarding the type of the label...?

先谢谢您!

推荐答案

这就是答案!!! 在我的问题中,我加载了一个模型.

Here is the answer!!! In my question I load a model.

但是答案就在那之前.

为适应模型,我在目标上使用了labelIndexer. 该标签索引器通过降低频率将目标转换为索引.

To fit the model I use a labelIndexer on my target. This label indexer transform the target into an indexe by descending frequency.

例如:如果在我的目标中我有20%的"aa"和80%的"bb" 标签索引器将创建一个标签"列,其"bb"的值为0,"aa"的值为1(因为我"bb"的矿石比"aa"的矿石频繁)

ex: if, in my target I have 20% of "aa" and 80% of "bb" label indexer will create a column "label" that took the value 0 for "bb" and 1 for "aa" (because I "bb" is ore frequent than "aa")

当我们拟合随机森林时,概率与频率顺序相对应.

When we fit a random forest, the probabilities correspond to the order of frequency.

在二进制分类中:

  • first proba =该班级是火车中最频繁上课的概率
  • 第二个proba =班级是火车中班次较少的班级的概率

这篇关于Scala:如何知道哪种概率对应于哪一类?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆