通过pyspark.ml.tuning.TrainValidationSplit调整后如何获得最佳参数? [英] How to get best params after tuning by pyspark.ml.tuning.TrainValidationSplit?

查看:548
本文介绍了通过pyspark.ml.tuning.TrainValidationSplit调整后如何获得最佳参数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试通过TrainValidationSplit调整Spark(PySpark)ALS模型的超参数.

I'm trying to tune the hyper-parameters of a Spark (PySpark) ALS model by TrainValidationSplit.

效果很好,但是我想知道哪种超参数组合是最好的.评估后如何获得最佳参数?

It works well, but I want to know which combination of hyper-parameters is the best. How to get best params after evaluation ?

from pyspark.ml.recommendation import ALS
from pyspark.ml.tuning import TrainValidationSplit, ParamGridBuilder
from pyspark.ml.evaluation import RegressionEvaluator

df = sqlCtx.createDataFrame(
    [(0, 0, 4.0), (0, 1, 2.0), (1, 1, 3.0), (1, 2, 4.0), (2, 1, 1.0), (2, 2, 5.0)],
    ["user", "item", "rating"],
)

df_test = sqlCtx.createDataFrame(
    [(0, 0), (0, 1), (1, 1), (1, 2), (2, 1), (2, 2)],
    ["user", "item"],
)

als = ALS()

param_grid = ParamGridBuilder().addGrid(
    als.rank,
    [10, 15],
).addGrid(
    als.maxIter,
    [10, 15],
).build()

evaluator = RegressionEvaluator(
    metricName="rmse",
    labelCol="rating",
)
tvs = TrainValidationSplit(
    estimator=als,
    estimatorParamMaps=param_grid,
    evaluator=evaluator,
)


model = tvs.fit(df)

问题:如何获得最佳排名和maxIter?

Question: How to get best rank and maxIter ?

推荐答案

您可以使用 bestModel 属性/pyspark.ml.html#pyspark.ml.tuning.TrainValidationSplitModel"rel =" noreferrer> TrainValidationSplitModel :

You can access best model using bestModel property of the TrainValidationSplitModel:

best_model = model.bestModel

可以使用 rank 属性"rel =" noreferrer> ALSModel :

Rank can be accessed directly using rank property of the ALSModel:

best_model.rank

10

获得最大的迭代次数需要更多的技巧:

Getting maximum number of iterations requires a bit more trickery:

(best_model
    ._java_obj     # Get Java object
    .parent()      # Get parent (ALS estimator)
    .getMaxIter()) # Get maxIter

10

这篇关于通过pyspark.ml.tuning.TrainValidationSplit调整后如何获得最佳参数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆