Spark CrossValidatorModel 访问其他模型而不是 bestModel? [英] Spark CrossValidatorModel access other models than the bestModel?

查看:21
本文介绍了Spark CrossValidatorModel 访问其他模型而不是 bestModel?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用的是 Spark 1.6.1:

目前我正在使用 CrossValidator 用各种参数训练我的 ML 管道.在训练过程之后,我可以使用 CrossValidatorModel 的 bestModel 属性来获取在交叉验证期间表现最佳的模型.交叉验证的其他模型是否会自动丢弃,或者我可以选择性能比 bestModel 差的模型吗?

我之所以这么问是因为我使用 F1 分数指标进行交叉验证,但我也对所有模型的 weightedRecall 感兴趣,而不仅仅是在交叉验证期间表现最佳的模型

val 折叠 = 6val cv = 新的 CrossValidator().setEstimator(管道).setEvaluator(新的MulticlassClassificationEvaluator).setEstimatorParamMaps(paramGrid).setNumFolds(折叠)val avgF1Scores = cvModel.avgMetricsval predictedDf = cvModel.bestModel.transform(testDf)//这里我想用交叉验证的其他模型进行预测

解决方案

Spark >= 2.4.0 ( >= 2>

SPARK-21088 CrossValidator、TrainValidationSplit 应该收集所有模型拟合时 - 添加对收集子模型的支持.

cv = CrossValidator(..., collectSubModels=True)模型 = cv.fit(...)模型.子模型

火花<2.4

如果您想访问所有中间模型,您必须从头开始创建自定义交叉验证器.o.a.s.ml.tuning.CrossValidator 丢弃其他模型,只将最好的模型和指标复制到 CrossValidatorModel.

另见 Pyspark - 获取使用 ParamGridBuilder 创建的模型的所有参数

I am using Spark 1.6.1:

Currently I am using a CrossValidator to train my ML Pipeline with various parameters. After the training process I can use the bestModel property of the CrossValidatorModel to get the Model that performed best during the Cross Validation. Are the other models of the cross validation automatically discarded or can I select a model that performed worse than the bestModel?

I am asking because I am using the F1 Score metric for the cross validation but I am also interested in the weighedRecall of all of the models and not just of the model that has performed best during the crossvalidation

val folds = 6
val cv = new CrossValidator()
  .setEstimator(pipeline)
  .setEvaluator(new MulticlassClassificationEvaluator)
  .setEstimatorParamMaps(paramGrid)
  .setNumFolds(folds)

val avgF1Scores = cvModel.avgMetrics

val predictedDf = cvModel.bestModel.transform(testDf)

// Here I would like to predict as well with the other models of the cross validation

解决方案

Spark >= 2.4.0 ( >= 2.3.0 in Scala)

SPARK-21088 CrossValidator, TrainValidationSplit should collect all models when fitting - adds support for collecting submodels.

cv = CrossValidator(..., collectSubModels=True)

model = cv.fit(...)
model.subModels

Spark < 2.4

If you want to access all intermediate models you'll have to create custom cross validator from scratch. o.a.s.ml.tuning.CrossValidator discards other models, and only the best one and metrics are copied to the CrossValidatorModel.

See also Pyspark - Get all parameters of models created with ParamGridBuilder

这篇关于Spark CrossValidatorModel 访问其他模型而不是 bestModel?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆