如何在 Apache Spark Pipeline 中打印最佳模型参数? [英] How to print best model params in Apache Spark Pipeline?
本文介绍了如何在 Apache Spark Pipeline 中打印最佳模型参数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在使用 Apache Spark 的管道 API 来验证参数.我正在像这样构建 TrainValidationSplitModel :
I'm using pipeline API of Apache Spark for validation of parameters. I'm building TrainValidationSplitModel like this :
Pipeline pipeline = ...
ParamMap[] paramGrid = ...
TrainValidationSplit trainValidationSplit = new TrainValidationSplit().setEstimator(pipeline).setEvaluator(new MulticlassClassificationEvaluator()).setEstimatorParamMaps(paramGrid).setTrainRatio(0.8);
TrainValidationSplitModel model = trainValidationSplit.fit(training);
我的问题是:如何提取和打印最佳训练模型的参数?
My question is: how can I extract and print params of best trained model?
推荐答案
我终于做到了.Spark 在训练后打印此指标.我有 Spark 的 ERROR 日志级别,所以我没有看到这个:
Finally I did it. Spark prints this metrics after training. I had ERROR log level for spark, so I haven't seen this:
2015-10-21 12:57:33,828 [INFO org.apache.spark.ml.tuning.TrainValidationSplit]
Train validation split metrics: WrappedArray(0.7141940371838821, 0.7358721053749735)
2015-10-21 12:57:33,831 [INFO org.apache.spark.ml.tuning.TrainValidationSplit]
Best set of parameters:
{
hashingTF_79cf758f5ab1-numFeatures: 2000000,
nb_67d55ce4e1fc-smoothing: 1.0
}
2015-10-21 12:57:33,831 [INFO org.apache.spark.ml.tuning.TrainValidationSplit]
Best train validation split metric: 0.7358721053749735.
现在我在我的 log4j.properties 文件中为类 TrainValidationSplit 添加了级别信息:
Now I've added level INFO for class TrainValidationSplit in my log4j.properties file:
log4j.logger.org.apache.spark.ml.tuning.TrainValidationSplit=INFO
log4j.additivity.org.apache.spark.ml.tuning.TrainValidationSplit=false
这篇关于如何在 Apache Spark Pipeline 中打印最佳模型参数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文