如何Mesos集群上使用星火时pre封装外部库 [英] How to pre-package external libraries when using Spark on a Mesos cluster
问题描述
据对Mesos文档的星火的人需要设置 spark.executor.uri
指向星火分布:
According to the Spark on Mesos docs one needs to set the spark.executor.uri
pointing to a Spark distribution:
val conf = new SparkConf()
.setMaster("mesos://HOST:5050")
.setAppName("My app")
.set("spark.executor.uri", "<path to spark-1.4.1.tar.gz uploaded above>")
该文档还指出,可以建立星火分布的定制版本。
The docs also note that one can build a custom version of the Spark distribution.
我现在的问题是,是否有可能/最好pre-封装外部库,如
My question now is whether it is possible/desirable to pre-package external libraries such as
- 火花流 - 卡夫卡
- elasticsearch火花
- 火花CSV
这将主要通过中使用的所有作业罐子,我会提交火花提交
到
- 缩短时间
SBT组装
需要打包脂肪罐子 - 减少需要提交脂肪罐子的大小
- reduce the time
sbt assembly
need to package the fat jars - reduce the size of the fat jars which need to be submitted
如果是这样,如何才能实现这一目标?一般来说,有没有对如何在作业提交过程中,脂肪的罐子一代可以加快一些提示?
If so, how can this be achieved? Generally speaking, are there some hints on how the fat jar generation on job submitting process can be speed up?
背景是,我要运行一些code代火花的工作,并提交这些马上和异步显示在浏览器前端的结果。前端部分应该不会太复杂,但我不知道后端部分如何才能实现。
Background is that I want to run some code-generation for Spark jobs, and submit these right away and show the results in a browser frontend asynchronously. The frontend part shouldn't be too complicated, but I wonder how the backend part can be achieved.
推荐答案
在我发现在的星火JobServer 的项目,我决定,这是最合适的一个我的使用情况。
After I discovered the Spark JobServer project, I decided that this is the most suitable one for my use case.
它支持通过REST API动态上下文创建,以及添加的JAR到新创建的上下文手动/编程。它还能够runnign低延迟同步工作,而这正是我需要的。
It supports dynamic context creation via a REST API, as well as adding JARs to the newly created context manually/programmatically. It also is capable of runnign low-latency synchronous jobs, which is exactly what I need.
我创建了一个Dockerfile这样你就可以星火最新的(支持)的版本(1.4.1),星火JobServer(0.6.0)和BUIT式Mesos支持(0.24.1)尝试一下:
I created a Dockerfile so you can try it out with the most recent (supported) versions of Spark (1.4.1), Spark JobServer (0.6.0) and buit-in Mesos support (0.24.1):
- https://github.com/tobilg/docker-spark-jobserver
- https://hub.docker.com/r/tobilg/spark-jobserver/
参考文献:
- https://github.com/spark-jobserver/spark-jobserver#features
- https://github.com/spark-jobserver/spark-jobserver#上下文配置
- https://github.com/spark-jobserver/spark-jobserver#features
- https://github.com/spark-jobserver/spark-jobserver#context-configuration
这篇关于如何Mesos集群上使用星火时pre封装外部库的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!