火花提交纱线集群--jars不起作用? [英] spark-submit yarn-cluster with --jars does not work?
问题描述
我想通过以下命令提交火花作业到CDH纱线集群
I am trying to submit a spark job to the CDH yarn cluster via the following commands
我尝试了好几种的组合,这一切没有工作...
我现在都位于两个我的本地/根的POI罐子,以及HDFS /用户/根/ lib中,所以我曾尝试以下
I have tried several combinations and it all does not work... I now have all the poi jars located in both my local /root, as well as HDFS /user/root/lib, hence I have tried the following
spark-submit --master yarn-cluster --class "ReadExcelSC" ./excel_sc.jar --jars /root/poi-3.12.jars, /root/poi-ooxml-3.12.jar, /root/poi-ooxml-schemas-3.12.jar
spark-submit --master yarn-cluster --class "ReadExcelSC" ./excel_sc.jar --jars file:/root/poi-3.12.jars, file:/root/poi-ooxml-3.12.jar, file:/root/poi-ooxml-schemas-3.12.jar
spark-submit --master yarn-cluster --class "ReadExcelSC" ./excel_sc.jar --jars hdfs://mynamenodeIP:8020/user/root/poi-3.12.jars,hdfs://mynamenodeIP:8020/user/root/poi-ooxml-3.12.jar,hdfs://mynamenodeIP:8020/user/root/poi-ooxml-schemas-3.12.jar
我如何传播完成的罐子到所有群集节点?因为没有上述的工作,而工作仍然不知何故没有得到引用类,因为我不断收到同样的错误:
How do I propogate the jars to all cluster nodes? because none of the above is working, and the job still somehow does not get to reference the class, as I keep getting the same error:
java.lang.NoClassDefFoundError: org/apache/poi/ss/usermodel/WorkbookFactory
在相同的命令作品有--master当地的,不指定--jars,因为我抄我的罐子到/ opt / Cloudera的/包裹/ CDH / lib中/火花/ lib目录。
The same command works with "--master local", without specifying the --jars, as I have copied my jars to /opt/cloudera/parcels/CDH/lib/spark/lib.
不过纱线群集模式,我需要外部罐子分发到所有群集,但上面code不起作用。
However for yarn-cluster mode, I would need to distribute the external jars to all cluster, but the above code does not work.
鸭preciate您的帮助,谢谢。
Appreciate your help, thanks.
P.S。我使用CDH5.4.2火花1.3.0
p.s. I am using CDH5.4.2 with spark 1.3.0
推荐答案
据以帮助星火选项提交
-
- 罐子包括本地罐子包括司机和执行人的classpath。 [它只是设置的路径]
--jars includes the local jars to include on the driver and executor classpaths. [it will just set the path]
---文件您器件的应用运行到执行人节点的所有的工作目录[它将运输您的罐子结果将需要复制的罐子
工作目录]
---files will copy the jars needed for you appication to run to all the working dir of executor nodes [it will transport your jar to
working dir]
注意:这类似于-file在Hadoop的数据流,运的映射/减速脚本到从属节点选项
Note: This is similar to -file options in hadoop streaming , which transports the mapper/reducer scripts to slave nodes.
因此,与--files选项尝试为好。
So try with --files options as well.
$ spark-submit --help
Options:
--jars JARS Comma-separated list of local jars to include on the driver
and executor classpaths.
--files FILES Comma-separated list of files to be placed in the working
directory of each executor.
希望这有助于
这篇关于火花提交纱线集群--jars不起作用?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!