Spark CrossValidatorModel是否可以访问除bestModel以外的其他模型? [英] Spark CrossValidatorModel access other models than the bestModel?

查看:128
本文介绍了Spark CrossValidatorModel是否可以访问除bestModel以外的其他模型?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Spark 1.6.1:

I am using Spark 1.6.1:

当前,我正在使用CrossValidator来训练具有各种参数的ML管道.在训练过程之后,我可以使用CrossValidatorModel的bestModel属性来获得在交叉验证期间表现最佳的模型. 交叉验证的其他模型是否会自动丢弃,或者我可以选择性能比bestModel差的模型吗?

Currently I am using a CrossValidator to train my ML Pipeline with various parameters. After the training process I can use the bestModel property of the CrossValidatorModel to get the Model that performed best during the Cross Validation. Are the other models of the cross validation automatically discarded or can I select a model that performed worse than the bestModel?

我之所以问是因为我使用F1评分指标进行交叉验证,但我也对所有模型的weightedRecall感兴趣,而不仅仅是对交叉验证期间效果最好的模型进行了称呼

I am asking because I am using the F1 Score metric for the cross validation but I am also interested in the weighedRecall of all of the models and not just of the model that has performed best during the crossvalidation

val folds = 6
val cv = new CrossValidator()
  .setEstimator(pipeline)
  .setEvaluator(new MulticlassClassificationEvaluator)
  .setEstimatorParamMaps(paramGrid)
  .setNumFolds(folds)

val avgF1Scores = cvModel.avgMetrics

val predictedDf = cvModel.bestModel.transform(testDf)

// Here I would like to predict as well with the other models of the cross validation

推荐答案

Spark> = 2.4.0 (> = SPARK-21088 CrossValidator,TrainValidationSplit应该收集所有模型合适时-增加了对收集子模型的支持.

SPARK-21088 CrossValidator, TrainValidationSplit should collect all models when fitting - adds support for collecting submodels.

cv = CrossValidator(..., collectSubModels=True)

model = cv.fit(...)
model.subModels

火花< 2.4

如果要访问所有中间模型,则必须从头开始创建自定义交叉验证器. o.a.s.ml.tuning.CrossValidator丢弃其他模型,并且仅将最佳模型和指标复制到CrossValidatorModel.

If you want to access all intermediate models you'll have to create custom cross validator from scratch. o.a.s.ml.tuning.CrossValidator discards other models, and only the best one and metrics are copied to the CrossValidatorModel.

另请参见 Pyspark-获取使用ParamGridBuilder创建的模型的所有参数

这篇关于Spark CrossValidatorModel是否可以访问除bestModel以外的其他模型?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆