Spark 将新的拟合阶段添加到现有的 PipelineModel 而无需再次拟合 [英] Spark add new fitted stage to a exitsting PipelineModel without fitting again
问题描述
我有一个已保存的 PipelineModel:
I have a saved PipelineModel:
pipe_model = pipe.fit(df_train)
pipe_model.write().overwrite().save("/user/pipe_text_2")
现在我想向这个管道添加一个新的已经安装好的管道模型:
And now I want to add to this Pipe a new already fited PipelineModel:
pipe_model = PipelineModel.load("/user/pipe_text_2")
df2 = pipe_model.transform(df1)
kmeans = KMeans(k=20)
pipe2 = Pipeline(stages=[kmeans])
pipe_model2 = pipe2.fit(df2)
如果不重新安装它可以吗?为了获得新的 PipelineModel 而不是新的 Pipeline.理想情况如下:
Is that possible without fitting it again? In order to obtain a new PipelineModel but not a new Pipeline. The ideal thing would be the following:
pipe_model_new = pipe_model + pipe_model2
TypeError: unsupported operand type(s) for +: 'PipelineModel' and 'PipelineModel'
我发现 将两个 Spark mllib 管道连接在一起 但是使用此解决方案,您需要再次安装整个管道.这就是我想要避免的.
I've found Join two Spark mllib pipelines together but with this solution you need to fit the whole Pipe again. That is what I'm trying to avoid.
推荐答案
由于 PipelineModel
是 PipelieModel
类的有效 stage
,你应该能够使用这个不需要 fit
再次:
Since PipelineModel
s are valid stage
s for a PipelieModel
class, you should be able to use this which does not require fit
ing again:
pipe_model_new = PipelineModel(stages = [pipe_model , pipe_model2])
final_df = pipe_model_new.transform(df1)
这篇关于Spark 将新的拟合阶段添加到现有的 PipelineModel 而无需再次拟合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!