当两个相似的类同时存在时,spark uber jar和spark-submit --jars之间的类路径解析 [英] Classpath resolution between spark uber jar and spark-submit --jars when similar classes exist in both

查看:302
本文介绍了当两个相似的类同时存在时,spark uber jar和spark-submit --jars之间的类路径解析的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我的spark应用程序的uber jar和我的spark-submit shell命令的--jars选项的内容都包含相似的依赖关系时,类加载的优先级是什么?

What is the precedence in class loading when both the uber jar of my spark application and the contents of --jars option to my spark-submit shell command contain similar dependencies ?

我从第三方库集成的角度提出这个问题.如果我将--jars设置为在2.0版中使用第三方库,并且使用2.1版组装了该spark-submit脚本中的uber jar,则该类会在运行时加载?

I ask this from a third-party library integration standpoint. If I set --jars to use a third-party library at version 2.0 and the uber jar coming into this spark-submit script was assembled using version 2.1, which class is loaded at runtime ?

目前,我想保留对hdfs的依赖关系,并将其添加到spark-submit的--jars选项中,同时希望通过一些最终用户文档来要求用户设置此第三方库的范围可以在他们的spark应用程序的maven pom文件中提供".

At present, I think of keeping my dependencies on hdfs, and add them to the --jars option on spark-submit, while hoping via some end-user documentation to ask users to set the scope of this third-party library to be 'provided' in their spark application's maven pom file.

推荐答案

这在某种程度上由params控制:

This is somewhat controlled with params:

  • spark.driver.userClassPathFirst&
  • spark.executor.userClassPathFirst
  • spark.driver.userClassPathFirst &
  • spark.executor.userClassPathFirst

如果设置为true(default为false),则来自 docs :

If set to true (default is false), from docs:

(实验性)在驱动程序中加载类时,是否赋予用户添加的jar优先于Spark自己的jar的权限.此功能可用于缓解Spark依赖项和用户依赖项之间的冲突.目前,这是一项实验功能.仅在群集模式下使用.

(Experimental) Whether to give user-added jars precedence over Spark's own jars when loading classes in the the driver. This feature can be used to mitigate conflicts between Spark's dependencies and user dependencies. It is currently an experimental feature. This is used in cluster mode only.

我编写了一些控制此代码的代码,并且在早期发行版中存在一些错误,但是如果您使用的是最新的Spark发行版,它应该可以工作(尽管它仍然是实验性功能).

I wrote some of the code that controls this, and there were a few bugs in the early releases, but if you're using a recent Spark release it should work (although it is still an experimental feature).

这篇关于当两个相似的类同时存在时,spark uber jar和spark-submit --jars之间的类路径解析的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆