如何在 pyspark 管道中打印最佳模型参数 [英] How to print best model params in pyspark pipeline

查看：23 发布时间：2021/11/14 21:09:59 python apache-spark pyspark apache-spark-mllib

本文介绍了如何在 pyspark 管道中打印最佳模型参数的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

这个问题类似于这个问题.我想在 pyspark 中执行 TrainValidationSplit 后打印最佳模型参数.我找不到其他用户用来回答问题的文本，因为我正在使用 jupyter 并且日志从终端消失了...

This question is similar to this one. I would like to print the best model params after doing a TrainValidationSplit in pyspark. I cannot find the piece of text the other user uses to answer the question because I'm working on jupyter and the log dissapears from the terminal...

部分代码是:

pca = PCA(inputCol = 'features')
dt = DecisionTreeRegressor(featuresCol=pca.getOutputCol(), 
                           labelCol="energy")
pipe = Pipeline(stages=[pca,dt])

paramgrid = ParamGridBuilder().addGrid(pca.k, range(1,50,2)).addGrid(dt.maxDepth, range(1,10,1)).build()

tvs = TrainValidationSplit(estimator = pipe, evaluator = RegressionEvaluator(
labelCol="energy", predictionCol="prediction", metricName="mae"), estimatorParamMaps = paramgrid, trainRatio = 0.66)

model = tvs.fit(wind_tr_va);

提前致谢.

It follows indeed the same reasoning described in the answer about How to get the maxDepth from a Spark RandomForestRegressionModel given by @user6910411.

您需要对 TrainValidationSplitModel、PCAModel 和 DecisionTreeRegressionModel 进行修补，如下所示:

You'll need to patch the TrainValidationSplitModel, PCAModel and DecisionTreeRegressionModel as followed :

TrainValidationSplitModel.bestModel = (
    lambda self: self._java_obj.bestModel
)

PCAModel.getK = (
    lambda self: self._java_obj.getK()
)

DecisionTreeRegressionModel.getMaxDepth = (
    lambda self: self._java_obj.getMaxDepth()
)

现在你可以用它来获得最佳模型并提取k和maxDepth

Now you can use it to get the best model and extract k and maxDepth

bestModel = model.bestModel

bestModelK = bestModel.stages[0].getK()
bestModelMaxDepth = bestModel.stages[1].getMaxDepth()

PS:您可以通过与上述相同的方式修补模型以获得特定参数.

PS: You can patch models to get specific parameters the same way described above.

这篇关于如何在 pyspark 管道中打印最佳模型参数的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何在 pyspark 管道中打印最佳模型参数 [英] How to print best model params in pyspark pipeline

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何在 pyspark 管道中打印最佳模型参数 [英] How to print best model params in pyspark pipeline

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭