一旦成为自定义ML管道中的阶段,如何为自定义PySpark变压器设置参数? [英] How to set parameters for a custom PySpark Transformer once it's a stage in a fitted ML Pipeline?

查看:60
本文介绍了一旦成为自定义ML管道中的阶段,如何为自定义PySpark变压器设置参数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已按照此处.

但是,在该示例中,_transform()所需的所有参数都通过估算器的_fit()方法方便地传递到了Model/Transformer中.但是我的变压器有几个参数可以控制变换的应用方式.这些参数是特定于变压器的,因此将它们与用于模型拟合的估计器特定参数一起预先传递到估计器中会感到很奇怪.

However, in that example all the parameters needed by _transform() were conveniently passed into the Model/Transformer by the estimator's _fit() method. But my transformer has several parameters that control the way the transform is applied. These parameters are specific to the transformer so it would feel odd to pass them into the estimator in advance along with the estimator-specific parameters used for fitting the model.

我可以通过在变压器中添加额外的Params来解决此问题.当我在ML管道之外使用估算器和转换器时,这种方法可以正常工作.但是,一旦将估算器对象作为阶段添加到管道中后,如何设置这些特定于变压器的参数?例如,您可以在pyspark.ml.pipeline.Pipeline上调用getStages(),因此可以获得估计量,但是在PipelineModel上没有相应的getStages()方法.我也没有在PipelineModel阶段上看到任何用于设置参数的方法.

I can work around this by adding extra Params to the transformer. This works fine when I use my estimator and transformer outside of an ML Pipeline. But how can I set these transformer-specific parameters once my estimator object has been added as a stage to a Pipeline? For example, you can call getStages() on a pyspark.ml.pipeline.Pipeline and can therefore get the estimators, but there is no corresponding getStages() method on PipelineModel. I can't see any methods for setting parameters on the PipelineModel stages either.

那么在装配管道模型上调用transform()之前,如何在变压器上设置参数?我正在使用Spark 2.2.0.

So how can I set the parameters on my transformer before I call transform() on the fitted pipeline model? I'm on Spark 2.2.0.

推荐答案

PipelineModel上没有getStages()方法,但是同一类确实具有

There is no getStages() method on PipelineModel but the same class does have an undocumented member called stages.

例如,如果您刚刚将管道模型拟合为三个阶段,并且想要在第二阶段设置一些参数,则可以执行以下操作:

For example, if you've just fitted a pipeline model with 3 stages and you want to set some parameters on the second stage, you can just do something like:

myModel = myPipelineModel.stages[1]
myModel.setMyParam(42)
# Or in one line:
#myPipelineModel.stages[1].setMyParam(42)

# Now we can push our data through the fully configured pipeline model:
resultsDF = myPipelineModel.transform(inputDF)

这篇关于一旦成为自定义ML管道中的阶段,如何为自定义PySpark变压器设置参数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆