Spark Fat jar在YARN上运行多个版本 [英] Spark fat jar to run multiple versions on YARN

查看:241
本文介绍了Spark Fat jar在YARN上运行多个版本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有使用YARN的Spark安装程序的较旧版本,我不想删除,但仍想使用较新的版本.我找到了很多这样下载. 似乎有多个插件可以使用maven做到这一点:maven-assembly-plugin,maven-shade-plugin,onejar-maven-plugin等.

但是,我不知道我是否真的需要一个插件,如果需要的话,究竟该选择哪个插件以及如何精确使用它.我尝试使用'build/mvn'和'build/sbt'直接编译github源,但是'spark-assembly_2.11-2.0.2.jar'文件只有283个字节.

我的目标是以此处所述的类似方式,使用较新版本的fat jar运行pyspark shell.

解决方案

最简单的解决方案(无需更改YARN架构上的Spark并与YARN管理员交谈)是:

  1. 在构建系统中定义对Spark 2的库依赖关系,无论是sbt还是maven.

  2. 组装您的Spark应用程序以创建一个内部带有Spark库的所谓的uber-jar或fatjar.

它有效,我在一个项目中至少亲自测试过一次.

唯一的(?)缺点是构建过程需要更长的时间(您必须使用sbt assembly而不是sbt package),并且Spark应用程序可部署的fatjar的大小...要好得多.这也使部署时间更长,因为您必须通过电线将其spark-submit部署到YARN.

总而言之,它可以工作,但是需要更长的时间(这可能比说服您的管理神灵还短,比如,忘掉Cloudera的CDH或Hortonworks的HDP或MapR发行版等商业产品中的可用产品).

I have an older version of Spark setup with YARN that I don't want to wipe out but still want to use a newer version. I found a couple posts referring to how a fat jar can be used for this.

Many SO posts point to either maven(officially supported) or sbt to build a fat jar because it's not directly available for download. There seem to be multiple plugins to do it using maven: maven-assembly-plugin, maven-shade-plugin, onejar-maven-plugin etc.

However, I can't figure out if I really need a plugin and if so, which one and how exactly to go about it. I tried directly compiling github source using 'build/mvn' and 'build/sbt' but the 'spark-assembly_2.11-2.0.2.jar' file is just 283 bytes.

My goal is to run pyspark shell using the newer version's fat jar in a similar way as mentioned here.

解决方案

The easiest solution (without changing your Spark on YARN architecture and speaking to your YARN admins) is to:

  1. Define a library dependency on Spark 2 in your build system, be it sbt or maven.

  2. Assemble your Spark application to create a so-called uber-jar or fatjar with Spark libraries inside.

It works and I personally tested it at least once in a project.

The only (?) downside of it is that the build process takes longer (you have to sbt assembly not sbt package) and the size of your Spark application's deployable fatjar is...well...much bigger. That also makes the deployment longer since you have to spark-submit it to YARN over the wire.

All in all, it works but takes longer (which may still be shorter than convincing your admin gods to, say forget about what is available in commercial offerings like Cloudera's CDH or Hortonworks' HDP or MapR distro).

这篇关于Spark Fat jar在YARN上运行多个版本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆