Spark Packages Flag vs Jars目录? [英] Spark packages flag vs jars dir?
问题描述
在Spark中,通过-packages
参数将JAR添加到类路径与将JAR直接添加到 $ SPARK_HOME/jars
目录之间有什么区别?
In Spark, what's the difference between adding JARs to the classpath via --packages
argument and just adding the JARs directly to the $SPARK_HOME/jars
directory?
推荐答案
TL; DR jars
用于使用URL指定的本地或远程jar文件,并且不解析依赖项, packages
用于Maven坐标,并且可以解决依赖关系.来自文档
TL;DR jars
are used for local or remote jar files specified with URL and dont resolve dependencies, packages
are used for Maven coordinates, and do resolve dependencies. From docs
-
-罐子
使用spark-submit时,应用程序jar以及--jars选项随附的所有jar都会自动传输到集群.--jars之后提供的URL必须用逗号分隔.该列表包含在驱动程序和执行程序的类路径中.目录扩展不适用于--jars.
When using spark-submit, the application jar along with any jars included with the --jars option will be automatically transferred to the cluster. URLs supplied after --jars must be separated by commas. That list is included in the driver and executor classpaths. Directory expansion does not work with --jars.
-程序包
通过提供带有--packages的Maven坐标的逗号分隔列表,用户还可以包括任何其他依赖项.使用此命令时,将处理所有传递依赖项.其他存储库(或SBT中的解析器)可以以逗号分隔的方式添加--repositories标志.(请注意,在某些情况下,可以在存储库URI中提供受密码保护的存储库的凭据,例如 https://user:password @ host/ ....以这种方式提供凭据时要小心.)这些命令可以与pyspark,spark-shell和spark-submit一起使用,以包含Spark软件包.
Users may also include any other dependencies by supplying a comma-delimited list of Maven coordinates with --packages. All transitive dependencies will be handled when using this command. Additional repositories (or resolvers in SBT) can be added in a comma-delimited fashion with the flag --repositories. (Note that credentials for password-protected repositories can be supplied in some cases in the repository URI, such as in https://user:password@host/.... Be careful when supplying credentials this way.) These commands can be used with pyspark, spark-shell, and spark-submit to include Spark Packages.
这篇关于Spark Packages Flag vs Jars目录?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!