如何Mesos集群上使用星火时pre封装外部库 [英] How to pre-package external libraries when using Spark on a Mesos cluster

查看:224
本文介绍了如何Mesos集群上使用星火时pre封装外部库的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

据对Mesos文档的星火的人需要设置 spark.executor.uri 指向星火分布:

According to the Spark on Mesos docs one needs to set the spark.executor.uri pointing to a Spark distribution:

val conf = new SparkConf()
  .setMaster("mesos://HOST:5050")
  .setAppName("My app")
  .set("spark.executor.uri", "<path to spark-1.4.1.tar.gz uploaded above>")

该文档还指出,可以建立星火分布的定制版本。

The docs also note that one can build a custom version of the Spark distribution.

我现在的问题是,是否有可能/最好pre-封装外部库,如

My question now is whether it is possible/desirable to pre-package external libraries such as


  • 火花流 - 卡夫卡

  • elasticsearch火花

  • 火花CSV

这将主要通过中使用的所有作业罐子,我会提交火花提交


  • 缩短时间 SBT组装需要打包脂肪罐子

  • 减少需要提交脂肪罐子的大小

  • reduce the time sbt assembly need to package the fat jars
  • reduce the size of the fat jars which need to be submitted

如果是这样,如何​​才能实现这一目标?一般来说,有没有对如何在作业提交过程中,脂肪的罐子一代可以加快一些提示?

If so, how can this be achieved? Generally speaking, are there some hints on how the fat jar generation on job submitting process can be speed up?

背景是,我要运行一些code代火花的工作,并提交这些马上和异步显示在浏览器前端的结果。前端部分应该不会太复杂,但我不知道后端部分如何才能实现。

Background is that I want to run some code-generation for Spark jobs, and submit these right away and show the results in a browser frontend asynchronously. The frontend part shouldn't be too complicated, but I wonder how the backend part can be achieved.

推荐答案

在我发现在星火JobServer 的项目,我决定,这是最合适的一个我的使用情况。

After I discovered the Spark JobServer project, I decided that this is the most suitable one for my use case.

它支持通过REST API动态上下文创建,以及添加的JAR到新创建的上下文手动/编程。它还能够runnign低延迟同步工作,而这正是我需要的。

It supports dynamic context creation via a REST API, as well as adding JARs to the newly created context manually/programmatically. It also is capable of runnign low-latency synchronous jobs, which is exactly what I need.

我创建了一个Dockerfile这样你就可以星火最新的(支持)的版本(1.4.1),星火JobServer(0.6.0)和BUIT式Mesos支持(0.24.1)尝试一下:

I created a Dockerfile so you can try it out with the most recent (supported) versions of Spark (1.4.1), Spark JobServer (0.6.0) and buit-in Mesos support (0.24.1):

  • https://github.com/tobilg/docker-spark-jobserver
  • https://hub.docker.com/r/tobilg/spark-jobserver/

参考文献:

  • https://github.com/spark-jobserver/spark-jobserver#features
  • https://github.com/spark-jobserver/spark-jobserver#context-configuration

这篇关于如何Mesos集群上使用星火时pre封装外部库的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆