从SparkR ML分类函数提取类概率 [英] Extracting Class Probabilities from SparkR ML Classification Functions

查看:142
本文介绍了从SparkR ML分类函数提取类概率的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道是否有可能(使用SparkR的内置功能或任何其他解决方法)提取SparkR中包含的某些分类算法的类概率.特别感兴趣的是.

I'm wondering if it's possible (using the built in features of SparkR or any other workaround), to extract the class probabilities of some of the classification algorithms that included in SparkR. Particular ones of interest are.

spark.gbt()
spark.mlp()
spark.randomForest()

spark.gbt()
spark.mlp()
spark.randomForest()

当前,当我在这些模型上使用预测函数时,我能够提取预测,但不能提取实际概率或置信度".

Currently, when I use the predict function on these models I am able to extract the predictions, but not the actual probabilities or "confidence."

我已经看到了其他几个与此主题相似的问题,但是没有一个特定于SparkR的问题,关于Spark的最新更新,许多问题都没有得到解答.

I've seen several other questions that are similar to this topic, but none that are specific to SparkR, and many have not been answered in regards to Spark's most recent updates.

推荐答案

我遇到了同样的问题,并遵循

i ran into the same problem, and following this answer now use SparkR:::callJMethod to transform the probability DenseVector (which R cannot deserialize) to an Array (which R reads as a List). It's not very elegant or fast, but it does the job:

  denseVectorToArray <- function(dv) {
    SparkR:::callJMethod(dv, "toArray")
  }

例如: 开始您的Spark会话

e.g.: start your spark session

#library(SparkR)
#sparkR.session(master = "local") 

生成玩具数据

data <- data.frame(clicked = base::sample(c(0,1),100,replace=TRUE),
                  someString = base::sample(c("this", "that"),
                                           100, replace=TRUE), 
                  stringsAsFactors=FALSE)

trainidxs <- base::sample(nrow(data), nrow(data)*0.7)
traindf <- as.DataFrame(data[trainidxs,])
testdf <- as.DataFrame(data[-trainidxs,])

训练一个随机森林并运行预测:

train a random forest and run predictions:

rf <- spark.randomForest(traindf, 
                        clicked~., 
                        type = "classification", 
                        maxDepth = 2, 
                        maxBins = 2,
                        numTrees = 100)

predictions <- predict(rf, testdf)

收集您的预测:

collected = SparkR::collect(predictions)    

现在提取概率:

collected$probabilities <- lapply(collected$probability, function(x)  denseVectorToArray(x))     
str(probs) 

当然,SparkR:::callJMethod周围的函数包装器有点过分.您也可以直接使用它,例如与dplyr:

ofcourse, the function wrapper around SparkR:::callJMethod is a bit of an overkill. You can also use it directly, e.g. with dplyr:

withprobs = collected %>%
            rowwise() %>%
            mutate("probabilities" = list(SparkR:::callJMethod(probability,"toArray"))) %>%
            mutate("prob0" = probabilities[[1]], "prob1" = probabilities[[2]])

这篇关于从SparkR ML分类函数提取类概率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆