如何使用 CrossValidator 获得精度/召回率以使用 Spark 训练 NaiveBayes 模型 [英] How to get Precision/Recall using CrossValidator for training NaiveBayes Model using Spark

查看：44 发布时间：2021/11/14 21:05:49 apache-spark apache-spark-mllib apache-spark-ml apache-spark-1.5

本文介绍了如何使用 CrossValidator 获得精度/召回率以使用 Spark 训练 NaiveBayes 模型的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

假设我有一个这样的管道:

Supossed I have a Pipeline like this:

val tokenizer = new Tokenizer().setInputCol("tweet").setOutputCol("words")
val hashingTF = new HashingTF().setNumFeatures(1000).setInputCol("words").setOutputCol("features")
val idf = new IDF().setInputCol("features").setOutputCol("idffeatures")
val nb = new org.apache.spark.ml.classification.NaiveBayes()
val pipeline = new Pipeline().setStages(Array(tokenizer, hashingTF, idf, nb))
val paramGrid = new ParamGridBuilder().addGrid(hashingTF.numFeatures, Array(10, 100, 1000)).addGrid(nb.smoothing, Array(0.01, 0.1, 1)).build()
val cv = new CrossValidator().setEstimator(pipeline).setEvaluator(new BinaryClassificationEvaluator()).setEstimatorParamMaps(paramGrid).setNumFolds(10)
val cvModel = cv.fit(df)

如您所见，我使用 MultiClassClassificationEvaluator 定义了一个 CrossValidator.我见过很多在测试过程中获得精度/召回率等指标的示例，但是当您使用不同的数据集进行测试时，会获得这些指标(例如，请参见文档).

As you can see I defined a CrossValidator using a MultiClassClassificationEvaluator. I have seen a lot of examples getting metrics like Precision/Recall during testing process but these metris are gotten when you use a different set of data for testing purposes (See for example this documentation).

根据我的理解，CrossValidator 将创建折叠，其中一个折叠将用于测试目的，然后 CrossValidator 将选择最佳模型.我的问题是，是否可以在训练过程中获得 Precision/Recall 指标?

From my understanding, CrossValidator is going to create folds and one fold will be use for testing purposes, then CrossValidator will choose the best model. My question is, is possible to get Precision/Recall metrics during training process?

如何使用 CrossValidator 获得精度/召回率以使用 Spark 训练 NaiveBayes 模型 [英] How to get Precision/Recall using CrossValidator for training NaiveBayes Model using Spark

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何使用 CrossValidator 获得精度/召回率以使用 Spark 训练 NaiveBayes 模型 [英] How to get Precision/Recall using CrossValidator for training NaiveBayes Model using Spark

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭