如何在Apache Spark中保存和加载MLLib模型? [英] How to save and load MLLib model in Apache Spark?
问题描述
我在Apache Spark中训练了分类模型(使用pyspark
).我将模型存储在对象LogisticRegressionModel
中.现在,我要对新数据进行预测.我想存储模型,然后将其读回到新程序中以进行预测.知道如何存储模型吗?我正在考虑泡菜,但我是python和Spark的新手,所以我想听听社区的想法.
I trained a classification model in Apache Spark (using pyspark
). I stored the model in an object, LogisticRegressionModel
. Now, I want to make predictions on new data. I would like to store the model, and read it back into a new program in order to make the predictions. Any idea how to store the model? I'm thinking of maybe pickle, but I'm a newbie to both python and Spark, so I'd like to hear what the community thinks.
推荐答案
You can save your model by using the save method of mllib
models.
# let lrm be a LogisticRegression Model
lrm.save(sc, "lrm_model.model")
存储后,您可以将其加载到另一个应用程序中.
After storing it you can load it in another application.
sameModel = LogisticRegressionModel.load(sc, "lrm_model.model")
如@ zero323之前所述,还有另一种方法可以实现此目的,方法是使用预测模型标记语言(PMML).
As @zero323 stated before, there is another way to achieve this, and is by using the Predictive Model Markup Language (PMML).
是数据挖掘小组开发的基于XML的文件格式,旨在为应用程序提供一种方法来描述和交换由数据挖掘和机器学习算法生成的模型.
is an XML-based file format developed by the Data Mining Group to provide a way for applications to describe and exchange models produced by data mining and machine learning algorithms.
这篇关于如何在Apache Spark中保存和加载MLLib模型?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!