在生产中部署R模型的选项 [英] Options for deploying R models in production

查看:252
本文介绍了在生产中部署R模型的选项的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在生产中部署预测模型似乎没有太多的选择,令人惊讶的是,由于Big Data的爆炸式增长。

There doesn't seem to be too many options for deploying predictive models in production which is surprising given the explosion in Big Data.

我明白,开源的PMML可以用于将模型导出为XML规范。这可以用于数据库内的评分/预测。然而,似乎要做这项工作,您需要使用Zementis的PMML插件,这意味着该解决方案不是真正的开源。有没有更简单的打开方式将PMML映射到SQL进行评分?

I understand that the open-source PMML can be used to export models as an XML specification. This can then be used for in-database scoring/prediction. However it seems that to make this work you need to use the PMML plugin by Zementis which means the solution is not truly open source. Is there an easier open way to map PMML to SQL for scoring?

另一个选择是使用JSON而不是XML来输出模型预测。但在这种情况下,R模型坐在哪里?我假设总是需要映射到SQL ...除非R模型可以与数据位于同一台服务器上,然后使用R脚本运行该传入数据。

Another option would be to use JSON instead of XML to output model predictions. But in this case, where would the R model sit? I'm assuming it would always need to be mapped to SQL...unless the R model could sit on the same server as the data and then run against that incoming data using an R script?

其他任何选项呢?

推荐答案

答案的确取决于你的生产环境。

The answer really depends on what your production environment is.

如果您的大数据在Hadoop上,您可以尝试这个相对较新的开源PMML评分引擎,名为模式

If your "big data" are on Hadoop, you can try this relatively new open source PMML "scoring engine" called Pattern.

否则,您无法选择(编写自定义模型特定代码),而是在您的服务器。您将使用保存将您的拟合模型保存在.RData文件中,然后加载并运行相应的在服务器上预测。 (这一定很慢,但是你总是可以尝试投掷更多的硬件。)

Otherwise you have no choice (short of writing custom model-specific code) but to run R on your server. You would use save to save your fitted models in .RData files and then load and run corresponding predict on the server. (That is bound to be slow but you can always try and throw more hardware at it.)

你的做法真的取决于你的平台。通常有一种方法可以添加用R编写的自定义函数。术语是UDF(用户自定义函数)。在Hadoop中,您可以向Pig添加这些功能(例如 https://github.com/cd-wood/pigaddons )或您可以使用 RHadoop 编写简单的map-reduce代码,以加载模型并调用预测。如果您的数据在Hive中,您可以使用 Hive TRANSFORM 调用外部R脚本。

How you do that really depends on your platform. Usually there is a way to add "custom" functions written in R. The term is UDF (user-defined function). In Hadoop you can add such functions to Pig (e.g. https://github.com/cd-wood/pigaddons) or you can use RHadoop to write simple map-reduce code that would load the model and call predict in R. If your data are in Hive, you can use Hive TRANSFORM to call external R script.

还有供应商特定的方法来将在R中编写的函数添加到各种SQL数据库。再次在文档中查找UDF。例如,PostgreSQL有 PL / R

There are also vendor-specific ways to add functions written in R to various SQL databases. Again look for UDF in the documentation. For instance, PostgreSQL has PL/R.

这篇关于在生产中部署R模型的选项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆