SPARK乐驰提交--jars参数都想逗号名单,如何声明罐子目录? [英] Spark spark-submit --jars arguments wants comma list, how to declare a directory of jars?

查看:136
本文介绍了SPARK乐驰提交--jars参数都想逗号名单,如何声明罐子目录?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在<一个href=\"http://spark.apache.org/docs/latest/submitting-applications.html#advanced-dependency-management\"相对=nofollow>提交星火文档应用,为1.6.0及更早版本,目前还不清楚如何指定--jars说法,因为它显然不是一个冒号分隔的类路径不是一个目录扩展

In Submitting Applications in the Spark docs, as of 1.6.0 and earlier, it's not clear how to specify the --jars argument, as it's apparently not a colon-separated classpath not a directory expansion.

文档说路径捆绑的罐子,包括应用程序和所有依赖该URL必须与集群内全局可见,例如,一个HDFS://路径或文件://路径是$ P所有节点上$ psent。

The docs say "Path to a bundled jar including your application and all dependencies. The URL must be globally visible inside of your cluster, for instance, an hdfs:// path or a file:// path that is present on all nodes."

问:什么是提交与classpath中的所有选项
  --jars在到$ SPARK_HOME / bin中火花提交脚本?无证任何可能被提交为文档的改善?

Question: What are all the options for submitting a classpath with --jars in the spark-submit script in $SPARK_HOME/bin? Anything undocumented that could be submitted as an improvement for docs?

我问,因为当我--jars今天的测试中,我们必须明确地提供给每个瓶子的路径:

I ask because when I was testing --jars today, we had to explicitly provide a path to each jar:

/usr/local/spark/bin/spark-submit --class jpsgcs.thold.PipeLinkageData ---jars=local:/usr/local/spark/jars/groovy-all-2.3.3.jar,local:/usr/local/spark/jars/guava-14.0.1.jar,local:/usr/local/spark/jars/jopt-simple-4.6.jar,local:/usr/local/spark/jars/jpsgcs-core-1.0.8-2.jar,local:/usr/local/spark/jars/jpsgcs-pipe-1.0.6-7.jar /usr/local/spark/jars/thold-0.0.1-1.jar

我们都选择pre-填充集群都在/ usr /本地/火花/罐上每个工人的罐子,似乎如果没有本地:/文件:/或HDFS:被提供,则默认为文件:/和驾驶员进行由驾驶员运行网络服务器提供的罐子。我选择的地方,如上面。

We are choosing to pre-populate the cluster with all the jars in /usr/local/spark/jars on each worker, it seemed that if no local:/ file:/ or hdfs: was supplied, then the default is file:/ and the driver makes the jars available on a webserver run by the driver. I chose local, as above.

和我们似乎并不需要把主要罐子在--jars说法,我还没有测试过,如果在最后一个参数的其他类(每个文档应用的jar阿根廷,即在/ usr /本地/火花/jars/thold-0.0.1-1.jar)运到工人,或者如果我需要把应用程序的jar在--jars路径来获得未命名--class待观察之后类。

And it seems that we do not need to put the main jar in the --jars argument, I have not tested yet if other classes in the final argument (application-jar arg per docs, i.e. /usr/local/spark/jars/thold-0.0.1-1.jar) are shipped to workers, or if I need to put the application-jar in the --jars path to get classes not named after --class to be seen.

(和使用--deploy模式客户端与星火独立模式理所当然的,你还必须把驱动程序的副本对每个工人,但你不知道哪起来工作人员会运行驱动程序前)

(And granted with Spark standalone mode using --deploy-mode client, you also have to put a copy of the driver on each worker but you don't know up front which worker will run the driver)

推荐答案

在这种方式很容易的工作..而不是单独指定与每个版本..罐子

In this way it worked easily.. instead of specifying each jar with version separately..

#!/bin/sh

# build all other dependent jars in OTHER_JARS

JARS=`find ../lib -name '*.jar'`

OTHER_JARS=""

for eachjarinlib in $JARS ; do

if [ "$eachjarinlib" != "APPLICATIONJARTOBEADDEDSEPERATELY.JAR" ]; then

   OTHER_JARS=$eachjarinlib,$OTHER_JARS

fi

done
echo ------------------------------------------------------------------final list of jars are : $OTHER_JARS
echo $CLASSPATH

spark-submit --verbose --class <yourclass>
... OTHER OPTIONS
--jars $OTHER_JARS,APPLICATIONJARTOBEADDEDSEPERATELY.JAR

这篇关于SPARK乐驰提交--jars参数都想逗号名单,如何声明罐子目录?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆