通过 spark-submit 将额外的 jars 传递给 Spark [英] Passing additional jars to Spark via spark-submit
问题描述
我将 Spark 与 MongoDB 一起使用,因此依赖于 mongo-hadoop
驱动程序.由于对我的原始问题的输入这里.
I'm using Spark with MongoDB, and consequently rely on the mongo-hadoop
drivers. I got things working thanks to input on my original question here.
我的 Spark 作业正在运行,但是,我收到了我不明白的警告.当我运行此命令时
My Spark job is running, however, I receive warnings that I don't understand. When I run this command
$SPARK_HOME/bin/spark-submit --driver-class-path /usr/local/share/mongo-hadoop/build/libs/mongo-hadoop-1.5.0-SNAPSHOT.jar:/usr/local/share/mongo-hadoop/spark/build/libs/mongo-hadoop-spark-1.5.0-SNAPSHOT.jar --jars /usr/local/share/mongo-hadoop/build/libs/mongo-hadoop-1.5.0-SNAPSHOT.jar:/usr/local/share/mongo-hadoop/spark/build/libs/mongo-hadoop-spark-1.5.0-SNAPSHOT.jar my_application.py
它有效,但给了我以下警告信息
it works, but gives me the following warning message
警告:本地 jar/usr/local/share/mongo-hadoop/build/libs/mongo-hadoop-1.5.0-SNAPSHOT.jar:/usr/local/share/mongo-hadoop/spark/build/libs/mongo-hadoop-spark-1.5.0-SNAPSHOT.jar不存在,跳过.
Warning: Local jar /usr/local/share/mongo-hadoop/build/libs/mongo-hadoop-1.5.0-SNAPSHOT.jar:/usr/local/share/mongo-hadoop/spark/build/libs/mongo-hadoop-spark-1.5.0-SNAPSHOT.jar does not exist, skipping.
当我试图让它工作时,如果我在提交作业时遗漏了这些路径,它根本不会运行.但是,现在,如果我省略这些路径,它确实会运行
When I was trying to get this working, if I left out those paths when submitting the job it wouldn't run at all. Now, however, if I leave out those paths it does run
$SPARK_HOME/bin/spark-submit my_application.py
有人可以解释一下这里发生了什么吗?我在此处查看了引用相同警告的类似问题,并搜索了文档.
Can someone please explain what is going on here? I have looked through similar questions here referencing the same warning, and searched through the documentation.
通过设置一次选项,它们是否存储为环境变量或其他东西?我很高兴它有效,但担心我不完全理解为什么有时而不是其他.
By setting the options once are they stored as environment variables or something? I'm glad it works, but wary that I don't fully understand why sometimes and not others.
推荐答案
问题是CLASSPATH
应该用冒号分隔,而JARS
应该用逗号分隔:>
The problem is that CLASSPATH
should be colon separated, while JARS
should be comma separated:
$SPARK_HOME/bin/spark-submit
--driver-class-path /usr/local/share/mongo-hadoop/build/libs/mongo-hadoop-1.5.0-SNAPSHOT.jar:/usr/local/share/mongo-hadoop/spark/build/libs/mongo-hadoop-spark-1.5.0-SNAPSHOT.jar
--jars /usr/local/share/mongo-hadoop/build/libs/mongo-hadoop-1.5.0-SNAPSHOT.jar,/usr/local/share/mongo-hadoop/spark/build/libs/mongo-hadoop-spark-1.5.0-SNAPSHOT.jar my_application.py
这篇关于通过 spark-submit 将额外的 jars 传递给 Spark的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!