从SparkR ML分类函数提取类概率 [英] Extracting Class Probabilities from SparkR ML Classification Functions
问题描述
我想知道是否有可能(使用SparkR的内置功能或任何其他解决方法)提取SparkR中包含的某些分类算法的类概率.特别感兴趣的是.
I'm wondering if it's possible (using the built in features of SparkR or any other workaround), to extract the class probabilities of some of the classification algorithms that included in SparkR. Particular ones of interest are.
spark.gbt()
spark.mlp()
spark.randomForest()
spark.gbt()
spark.mlp()
spark.randomForest()
当前,当我在这些模型上使用预测函数时,我能够提取预测,但不能提取实际概率或置信度".
Currently, when I use the predict function on these models I am able to extract the predictions, but not the actual probabilities or "confidence."
我已经看到了其他几个与此主题相似的问题,但是没有一个特定于SparkR的问题,关于Spark的最新更新,许多问题都没有得到解答.
I've seen several other questions that are similar to this topic, but none that are specific to SparkR, and many have not been answered in regards to Spark's most recent updates.
推荐答案
i ran into the same problem, and following this answer now use SparkR:::callJMethod
to transform the probability DenseVector
(which R cannot deserialize) to an Array
(which R reads as a List
). It's not very elegant or fast, but it does the job:
denseVectorToArray <- function(dv) {
SparkR:::callJMethod(dv, "toArray")
}
例如: 开始您的Spark会话
e.g.: start your spark session
#library(SparkR)
#sparkR.session(master = "local")
生成玩具数据
data <- data.frame(clicked = base::sample(c(0,1),100,replace=TRUE),
someString = base::sample(c("this", "that"),
100, replace=TRUE),
stringsAsFactors=FALSE)
trainidxs <- base::sample(nrow(data), nrow(data)*0.7)
traindf <- as.DataFrame(data[trainidxs,])
testdf <- as.DataFrame(data[-trainidxs,])
训练一个随机森林并运行预测:
train a random forest and run predictions:
rf <- spark.randomForest(traindf,
clicked~.,
type = "classification",
maxDepth = 2,
maxBins = 2,
numTrees = 100)
predictions <- predict(rf, testdf)
收集您的预测:
collected = SparkR::collect(predictions)
现在提取概率:
collected$probabilities <- lapply(collected$probability, function(x) denseVectorToArray(x))
str(probs)
当然,SparkR:::callJMethod
周围的函数包装器有点过分.您也可以直接使用它,例如与dplyr:
ofcourse, the function wrapper around SparkR:::callJMethod
is a bit of an overkill. You can also use it directly, e.g. with dplyr:
withprobs = collected %>%
rowwise() %>%
mutate("probabilities" = list(SparkR:::callJMethod(probability,"toArray"))) %>%
mutate("prob0" = probabilities[[1]], "prob1" = probabilities[[2]])
这篇关于从SparkR ML分类函数提取类概率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!