将新的拟合阶段添加到退出的PipelineModel中,而无需再次拟合 [英] Add new fitted stage to a exitsting PipelineModel without fitting again

查看:121
本文介绍了将新的拟合阶段添加到退出的PipelineModel中,而无需再次拟合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想将多条经过训练的管道连接到一条,类似于 " Spark将新的拟合阶段添加到退出PipelineModel而不再次拟合,但是以下解决方案适用于PySpark.

I would like to concatenate several trained Pipelines to one, which is similar to "Spark add new fitted stage to a exitsting PipelineModel without fitting again" however the solution as below is for PySpark.

> pipe_model_new = PipelineModel(stages = [pipe_model , pipe_model2])
> final_df = pipe_model_new.transform(df1)

在Apache Spark 2.0中,"PipelineModel"的构造函数被标记为私有,因此无法在外部调用.在"Pipeline"类中,只有"fit"方法会创建"PipelineModel"

In Apache Spark 2.0 "PipelineModel"'s constructor is marked as private, hence it can not be called outside. While in "Pipeline" class, only "fit" method creates "PipelineModel"

val pipelineModel =  new PipelineModel("randomUID", trainedStages)
val df_final_full = pipelineModel.transform(df)

Error:(266, 26) constructor PipelineModel in class PipelineModel cannot be accessed in class Preprocessor
    val pipelineModel =  new PipelineModel("randomUID", trainedStages)

推荐答案

使用Pipeline 并调用fit方法.如果阶段是Transfomer,而PipelineModel是**,则fit的作用类似于标识.

There is nothing* wrong with using Pipeline and invoking fit method. If a stage is a Transfomer, and PipelineModel is**, fit works like identity.

您可以检查相关的Python :

if isinstance(stage, Transformer):
    transformers.append(stage)
    dataset = stage.transform(dataset)

Scala代码:

这意味着拟合过程将仅验证架构并创建一个新的PipelineModel对象.

This means that fitting process will only validate the schema and create a new PipelineModel object.

case t: Transformer =>
  t


*唯一可能的担心是是否存在非懒惰的Transformers,但除已弃用的OneHotEncoder之外,Spark核心API均未提供这种功能.


* The only possible concern is presence of non-lazy Transformers, though, with exception to deprecated OneHotEncoder, Spark core API doesn't provide such.

**在Python中:

** In Python:

from pyspark.ml import Transformer, PipelineModel

issubclass(PipelineModel, Transformer)

True 

在Scala中

import scala.reflect.runtime.universe.typeOf
import org.apache.spark.ml._

typeOf[PipelineModel] <:< typeOf[Transformer]

Boolean = true

这篇关于将新的拟合阶段添加到退出的PipelineModel中,而无需再次拟合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆