从Java的另一个应用程序部署Apache Spark应用程序,最佳实践 [英] Deploy Apache Spark application from another application in Java, best practice

查看:283
本文介绍了从Java的另一个应用程序部署Apache Spark应用程序,最佳实践的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是Spark的新用户。我有一个Web服务,允许用户通过从数据库读取并将结果推回到数据库来请求服务器执行复杂的数据分析。我已将这些分析转移到各种Spark应用程序中。目前,我使用spark-submit来部署这些应用程序。



但是,当我的Web服务器(用Java编写)收到用户请求时,我很好奇, 最佳实践方式启动相应的Spark应用程序? Spark的文档似乎是使用spark-submit,但是我宁愿不将命令输出到终端来执行此操作。我看到一个替代的Spark-JobServer,它提供了一个RESTful接口来完成这一点,但是我的Spark应用程序是用Java或R编写的,这似乎与Spark-JobServer不兼容。



是否有另一个最佳做法是从Web服务器(Java)中启动火花应用程序,并等待状态结果,无论作业成功还是失败?



任何想法,其他人正在做什么来完成这将是非常有益的!谢谢!

解决方案

我也有类似的要求。以下是我所做的:


  1. 要提交应用程序,我使用隐藏的Spark REST提交API: http://arturmkrtchyan.com/apache-spark-hidden-rest-api


  2. 使用相同的API,您可以查询驱动程序的状态,或者可以稍后终止您的工作


  3. 还有另一个隐藏的UI Json API: http:// [master-node]:[master- ui-port] / json / ,以JSON格式显示主界面上提供的所有信息。


使用提交API我提交一个驱动程序并使用主UI API我等到我的驱动程序和应用程序状态正在运行


I am a new user of Spark. I have a web service that allows a user to request the server to perform a complex data analysis by reading from a database and pushing the results back to the database. I have moved those analysis's into various Spark applications. Currently I use spark-submit to deploy these applications.

However, I am curious, when my web server (written in Java) receives a user request, what is considered the "best practice" way to initiate the corresponding Spark application? Spark's documentation seems to be to use "spark-submit" but I would rather not pipe out the command to a terminal to perform this action. I saw an alternative, Spark-JobServer, which provides an RESTful interface to do exactly this, but my Spark applications are written in either Java or R, which seems to not interface well with Spark-JobServer.

Is there another best-practice to kickoff a spark application from a web server (in Java), and wait for a status result whether the job succeeded or failed?

Any ideas of what other people are doing to accomplish this would be very helpful! Thanks!

解决方案

I've had a similar requirement. Here's what I did:

  1. To submit apps, I use the hidden Spark REST Submission API: http://arturmkrtchyan.com/apache-spark-hidden-rest-api

  2. Using this same API you can query status for a Driver or you can Kill your Job later

  3. There's also another hidden UI Json API: http://[master-node]:[master-ui-port]/json/ which exposes all information available on the master UI in JSON format.

Using "Submission API" I submit a driver and using the "Master UI API" I wait until my Driver and App state are RUNNING

这篇关于从Java的另一个应用程序部署Apache Spark应用程序,最佳实践的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆