通过 spark-submit 将额外的 jars 传递给 Spark [英] Passing additional jars to Spark via spark-submit

查看:63
本文介绍了通过 spark-submit 将额外的 jars 传递给 Spark的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我将 Spark 与 MongoDB 一起使用,因此依赖于 mongo-hadoop 驱动程序.由于对我的原始问题的输入这里.

I'm using Spark with MongoDB, and consequently rely on the mongo-hadoop drivers. I got things working thanks to input on my original question here.

我的 Spark 作业正在运行,但是,我收到了我不明白的警告.当我运行此命令时

My Spark job is running, however, I receive warnings that I don't understand. When I run this command

$SPARK_HOME/bin/spark-submit --driver-class-path /usr/local/share/mongo-hadoop/build/libs/mongo-hadoop-1.5.0-SNAPSHOT.jar:/usr/local/share/mongo-hadoop/spark/build/libs/mongo-hadoop-spark-1.5.0-SNAPSHOT.jar --jars /usr/local/share/mongo-hadoop/build/libs/mongo-hadoop-1.5.0-SNAPSHOT.jar:/usr/local/share/mongo-hadoop/spark/build/libs/mongo-hadoop-spark-1.5.0-SNAPSHOT.jar my_application.py

它有效,但给了我以下警告信息

it works, but gives me the following warning message

警告:本地 jar/usr/local/share/mongo-hadoop/build/libs/mongo-hadoop-1.5.0-SNAPSHOT.jar:/usr/local/share/mongo-hadoop/spark/build/libs/mongo-hadoop-spark-1.5.0-SNAPSHOT.jar不存在,跳过.

Warning: Local jar /usr/local/share/mongo-hadoop/build/libs/mongo-hadoop-1.5.0-SNAPSHOT.jar:/usr/local/share/mongo-hadoop/spark/build/libs/mongo-hadoop-spark-1.5.0-SNAPSHOT.jar does not exist, skipping.

当我试图让它工作时,如果我在提交作业时遗漏了这些路径,它根本不会运行.但是,现在,如果我省略这些路径,它确实会运行

When I was trying to get this working, if I left out those paths when submitting the job it wouldn't run at all. Now, however, if I leave out those paths it does run

$SPARK_HOME/bin/spark-submit  my_application.py

有人可以解释一下这里发生了什么吗?我在此处查看了引用相同警告的类似问题,并搜索了文档.

Can someone please explain what is going on here? I have looked through similar questions here referencing the same warning, and searched through the documentation.

通过设置一次选项,它们是否存储为环境变量或其他东西?我很高兴它有效,但担心我不完全理解为什么有时而不是其他.

By setting the options once are they stored as environment variables or something? I'm glad it works, but wary that I don't fully understand why sometimes and not others.

推荐答案

问题是CLASSPATH应该用冒号分隔,而JARS应该用逗号分隔:

The problem is that CLASSPATH should be colon separated, while JARS should be comma separated:

$SPARK_HOME/bin/spark-submit 
--driver-class-path /usr/local/share/mongo-hadoop/build/libs/mongo-hadoop-1.5.0-SNAPSHOT.jar:/usr/local/share/mongo-hadoop/spark/build/libs/mongo-hadoop-spark-1.5.0-SNAPSHOT.jar 
--jars /usr/local/share/mongo-hadoop/build/libs/mongo-hadoop-1.5.0-SNAPSHOT.jar,/usr/local/share/mongo-hadoop/spark/build/libs/mongo-hadoop-spark-1.5.0-SNAPSHOT.jar my_application.py

这篇关于通过 spark-submit 将额外的 jars 传递给 Spark的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆