如何服务Spark MLlib模型? [英] How to serve a Spark MLlib model?
问题描述
我正在评估用于基于生产ML的应用程序的工具,我们的选择之一是Spark MLlib,但是我对如何在模型训练后如何提供服务有一些疑问?
I'm evaluating tools for production ML based applications and one of our options is Spark MLlib , but I have some questions about how to serve a model once its trained?
例如,在Azure ML中,一旦受过训练,该模型就会作为Web服务公开,可以从任何应用程序中使用,这与Amazon ML相似.
For example in Azure ML, once trained, the model is exposed as a web service which can be consumed from any application, and it's a similar case with Amazon ML.
您如何在Apache Spark中提供服务/部署ML模型?
How do you serve/deploy ML models in Apache Spark ?
推荐答案
一方面,用Spark构建的机器学习模型无法像传统方式那样为您在Azure ML或Amazon ML中提供服务.
From one hand, a machine learning model built with spark can't be served the way you serve in Azure ML or Amazon ML in a traditional manner.
Databricks声称能够使用它的笔记本来部署模型,但是我实际上还没有尝试过.
Databricks claims to be able to deploy models using it's notebook but I haven't actually tried that yet.
另一方面,您可以通过三种方式使用模型:
On other hand, you can use a model in three ways :
- 在应用程序内部进行动态培训,然后应用预测.这可以在spark应用程序或笔记本中完成.
- 训练模型并保存,如果它实现了
MLWriter
,则将其加载到应用程序或笔记本中并针对您的数据运行它. - 使用Spark训练模型,然后使用 jpmml-spark 将其导出为PMML格式. PMML允许不同的统计和数据挖掘工具使用相同的语言.这样,无需工具即可轻松地在工具和应用程序之间移动预测解决方案.例如从Spark ML到R.
- Training on the fly inside an application then applying prediction. This can be done in a spark application or a notebook.
- Train a model and save it if it implements an
MLWriter
then load in an application or a notebook and run it against your data. - Train a model with Spark and export it to PMML format using jpmml-spark. PMML allows for different statistical and data mining tools to speak the same language. In this way, a predictive solution can be easily moved among tools and applications without the need for custom coding. e.g from Spark ML to R.
这是三种可能的方式.
Those are the three possible ways.
当然,您可以考虑一个具有RESTful服务的体系结构,您可以在该体系结构中使用每个示例使用spark-jobserver进行构建,以进行培训和部署,但需要进行一些开发.这不是开箱即用的解决方案.
Of course, you can think of an architecture in which you have RESTful service behind which you can build using spark-jobserver per example to train and deploy but needs some development. It's not a out-of-the-box solution.
您还可以使用Oryx 2之类的项目来创建完整的lambda架构,以训练,部署和提供模型.
You might also use projects like Oryx 2 to create your full lambda architecture to train, deploy and serve a model.
不幸的是,对上述每个解决方案的描述都很广泛,并且不适合SO的范围.
Unfortunately, describing each of the mentioned above solution is quite broad and doesn't fit in the scope of SO.
这篇关于如何服务Spark MLlib模型?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!