如何为 Spark MLlib 模型提供服务? [英] How to serve a Spark MLlib model?
问题描述
我正在评估用于基于 ML 的生产应用程序的工具,我们的选择之一是 Spark MLlib,但我对如何在训练后提供模型有一些疑问?
I'm evaluating tools for production ML based applications and one of our options is Spark MLlib , but I have some questions about how to serve a model once its trained?
例如,在 Azure ML 中,一旦经过训练,模型就会作为 Web 服务公开,可以从任何应用程序中使用,这与 Amazon ML 的情况类似.
For example in Azure ML, once trained, the model is exposed as a web service which can be consumed from any application, and it's a similar case with Amazon ML.
您如何在 Apache Spark 中提供/部署 ML 模型?
How do you serve/deploy ML models in Apache Spark ?
推荐答案
一方面,使用 Spark 构建的机器学习模型无法以传统方式在 Azure ML 或 Amazon ML 中提供服务.
From one hand, a machine learning model built with spark can't be served the way you serve in Azure ML or Amazon ML in a traditional manner.
Databricks 声称能够使用它的笔记本部署模型,但我还没有真正尝试过.
Databricks claims to be able to deploy models using it's notebook but I haven't actually tried that yet.
另一方面,您可以通过三种方式使用模型:
On other hand, you can use a model in three ways :
- 在应用程序中进行动态训练,然后应用预测.这可以在 spark 应用程序或笔记本中完成.
- 训练模型并保存它(如果它实现了
MLWriter
),然后加载到应用程序或笔记本中并针对您的数据运行它. - 使用 Spark 训练模型并使用 jpmml-spark 将其导出为 PMML 格式.PMML 允许不同的统计和数据挖掘工具使用相同的语言.通过这种方式,预测解决方案可以轻松地在工具和应用程序之间移动,而无需自定义编码.例如,从 Spark ML 到 R.
- Training on the fly inside an application then applying prediction. This can be done in a spark application or a notebook.
- Train a model and save it if it implements an
MLWriter
then load in an application or a notebook and run it against your data. - Train a model with Spark and export it to PMML format using jpmml-spark. PMML allows for different statistical and data mining tools to speak the same language. In this way, a predictive solution can be easily moved among tools and applications without the need for custom coding. e.g from Spark ML to R.
这是三种可能的方式.
当然,您可以考虑一种架构,在该架构中您可以使用 RESTful 服务在每个示例中使用 spark-jobserver 构建以进行训练和部署,但需要一些开发.这不是开箱即用的解决方案.
Of course, you can think of an architecture in which you have RESTful service behind which you can build using spark-jobserver per example to train and deploy but needs some development. It's not a out-of-the-box solution.
您还可以使用 Oryx 2 等项目来创建完整的 lambda 架构来训练、部署和服务模型.
You might also use projects like Oryx 2 to create your full lambda architecture to train, deploy and serve a model.
不幸的是,对上述每个解决方案的描述都非常广泛,不适合 SO 的范围.
Unfortunately, describing each of the mentioned above solution is quite broad and doesn't fit in the scope of SO.
这篇关于如何为 Spark MLlib 模型提供服务?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!