忽略Spark Cluster自己的罐子 [英] Ignore Spark Cluster Own Jars

查看:89
本文介绍了忽略Spark Cluster自己的罐子的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用自己的应用程序Spark jars.具体来说,我有一个尚未发布的mllib罐,其中包含BisectingKMeans的固定错误.因此,我的想法是在我的Spark集群中使用它(在本地它可以正常工作).

I would like to use my own application Spark jars. More in concrete I have one jar of mllib that is not already released that contains a fixed bug of BisectingKMeans. So, my idea is to use it in my spark cluster (in locally it works perfectly).

我尝试了很多事情:extraclasspath,userClassPathFirst,jars选项...许多不起作用的选项.我的最后一个想法是使用sbt的Shade规则将所有org.apache.spark.*软件包更改为shadespark.*,但在部署时仍使用集群的火花罐.

I've tried many things: extraclasspath, userClassPathFirst, jars option...many options that do not work. My last idea is to use the Shade rule of sbt to change all org.apache.spark.* packages to shadespark.* but when I deploy it is still using the cluster' spark jars.

有什么主意吗?

推荐答案

您可以尝试使用Maven shade插件来重新定位冲突的软件包.这将为mllib jar的较新版本创建一个单独的命名空间.因此,旧版本和新版本都将在类路径上,但是由于新版本具有替代名称,因此您可以显式地引用较新的软件包.

You can try to use the Maven shade plugin to relocate the conflicting packages. This creates a separate namespace for the newer version of the mllib jar. So both the old and the new version will be on the classpath, but since the new version has an alternative name you can refer to the newer package explicitly.

看看 https://maven.apache.org/plugins/maven-shade-plugin/examples/class-relocation.html :

如果将uber JAR用作其他项目的依赖项,则由于uber JAR中工件的依赖项中直接包含类,因此会由于类路径上的重复类而导致类加载冲突.为了解决这个问题,可以重新定位阴影工件中包含的类,以创建其字节码的私有副本:

If the uber JAR is reused as a dependency of some other project, directly including classes from the artifact's dependencies in the uber JAR can cause class loading conflicts due to duplicate classes on the class path. To address this issue, one can relocate the classes which get included in the shaded artifact in order to create a private copy of their bytecode:

我从视频编写Spark应用程序时的前5个错误"中得到了这个想法: https://youtu.be/WyfHUNnMutg?t=23m1s

I got this idea from the video "Top 5 Mistakes When Writing Spark Applications": https://youtu.be/WyfHUNnMutg?t=23m1s

这篇关于忽略Spark Cluster自己的罐子的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆