通过pyspark.ml.tuning.TrainValidationSplit调整后如何获得最佳参数? [英] How to get best params after tuning by pyspark.ml.tuning.TrainValidationSplit?
问题描述
我正在尝试通过TrainValidationSplit
调整Spark(PySpark)ALS
模型的超参数.
I'm trying to tune the hyper-parameters of a Spark (PySpark) ALS
model by TrainValidationSplit
.
效果很好,但是我想知道哪种超参数组合是最好的.评估后如何获得最佳参数?
It works well, but I want to know which combination of hyper-parameters is the best. How to get best params after evaluation ?
from pyspark.ml.recommendation import ALS
from pyspark.ml.tuning import TrainValidationSplit, ParamGridBuilder
from pyspark.ml.evaluation import RegressionEvaluator
df = sqlCtx.createDataFrame(
[(0, 0, 4.0), (0, 1, 2.0), (1, 1, 3.0), (1, 2, 4.0), (2, 1, 1.0), (2, 2, 5.0)],
["user", "item", "rating"],
)
df_test = sqlCtx.createDataFrame(
[(0, 0), (0, 1), (1, 1), (1, 2), (2, 1), (2, 2)],
["user", "item"],
)
als = ALS()
param_grid = ParamGridBuilder().addGrid(
als.rank,
[10, 15],
).addGrid(
als.maxIter,
[10, 15],
).build()
evaluator = RegressionEvaluator(
metricName="rmse",
labelCol="rating",
)
tvs = TrainValidationSplit(
estimator=als,
estimatorParamMaps=param_grid,
evaluator=evaluator,
)
model = tvs.fit(df)
问题:如何获得最佳排名和maxIter?
Question: How to get best rank and maxIter ?
推荐答案
您可以使用 bestModel
属性/pyspark.ml.html#pyspark.ml.tuning.TrainValidationSplitModel"rel =" noreferrer> TrainValidationSplitModel
:
You can access best model using bestModel
property of the TrainValidationSplitModel
:
best_model = model.bestModel
可以使用 rank
ALSModel
:
Rank can be accessed directly using rank
property of the ALSModel
:
best_model.rank
10
获得最大的迭代次数需要更多的技巧:
Getting maximum number of iterations requires a bit more trickery:
(best_model
._java_obj # Get Java object
.parent() # Get parent (ALS estimator)
.getMaxIter()) # Get maxIter
10
这篇关于通过pyspark.ml.tuning.TrainValidationSplit调整后如何获得最佳参数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!