如何在pyspark管道中打印最佳模型参数 [英] How to print best model params in pyspark pipeline
问题描述
This question is similar to this one. I would like to print the best model params after doing a TrainValidationSplit in pyspark. I cannot find the piece of text the other user uses to answer the question because I'm working on jupyter and the log dissapears from the terminal...
部分代码是:
pca = PCA(inputCol = 'features')
dt = DecisionTreeRegressor(featuresCol=pca.getOutputCol(),
labelCol="energy")
pipe = Pipeline(stages=[pca,dt])
paramgrid = ParamGridBuilder().addGrid(pca.k, range(1,50,2)).addGrid(dt.maxDepth, range(1,10,1)).build()
tvs = TrainValidationSplit(estimator = pipe, evaluator = RegressionEvaluator(
labelCol="energy", predictionCol="prediction", metricName="mae"), estimatorParamMaps = paramgrid, trainRatio = 0.66)
model = tvs.fit(wind_tr_va);
谢谢.
推荐答案
它的确遵循与如何从@ user6910411给出的Spark RandomForestRegressionModel 中获取maxDepth.
It follows indeed the same reasoning described in the answer about How to get the maxDepth from a Spark RandomForestRegressionModel given by @user6910411.
您需要按如下所示修补TrainValidationSplitModel
,PCAModel
和DecisionTreeRegressionModel
:
You'll need to patch the TrainValidationSplitModel
, PCAModel
and DecisionTreeRegressionModel
as followed :
TrainValidationSplitModel.bestModel = (
lambda self: self._java_obj.bestModel
)
PCAModel.getK = (
lambda self: self._java_obj.getK()
)
DecisionTreeRegressionModel.getMaxDepth = (
lambda self: self._java_obj.getMaxDepth()
)
现在,您可以使用它来获得最佳模型并提取k
和maxDepth
Now you can use it to get the best model and extract k
and maxDepth
bestModel = model.bestModel
bestModelK = bestModel.stages[0].getK()
bestModelMaxDepth = bestModel.stages[1].getMaxDepth()
PS::您可以按照上述相同的方法修补模型以获取特定的参数.
PS: You can patch models to get specific parameters the same way described above.
这篇关于如何在pyspark管道中打印最佳模型参数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!