火花提交纱线集群--jars不起作用? [英] spark-submit yarn-cluster with --jars does not work?

查看:273
本文介绍了火花提交纱线集群--jars不起作用?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想通过以下命令提交火花作业到CDH纱线集群

I am trying to submit a spark job to the CDH yarn cluster via the following commands

我尝试了好几种的组合,这一切没有工作...
我现在都位于两个我的本地/根的POI罐子,以及HDFS /用户/根/ lib中,所以我曾尝试以下

I have tried several combinations and it all does not work... I now have all the poi jars located in both my local /root, as well as HDFS /user/root/lib, hence I have tried the following

spark-submit --master yarn-cluster --class "ReadExcelSC" ./excel_sc.jar --jars /root/poi-3.12.jars, /root/poi-ooxml-3.12.jar, /root/poi-ooxml-schemas-3.12.jar

spark-submit --master yarn-cluster --class "ReadExcelSC" ./excel_sc.jar --jars file:/root/poi-3.12.jars, file:/root/poi-ooxml-3.12.jar, file:/root/poi-ooxml-schemas-3.12.jar

spark-submit --master yarn-cluster --class "ReadExcelSC" ./excel_sc.jar --jars hdfs://mynamenodeIP:8020/user/root/poi-3.12.jars,hdfs://mynamenodeIP:8020/user/root/poi-ooxml-3.12.jar,hdfs://mynamenodeIP:8020/user/root/poi-ooxml-schemas-3.12.jar

我如何传播完成的罐子到所有群集节点?因为没有上述的工作,而工作仍然不知何故没有得到引用类,因为我不断收到同样的错误:

How do I propogate the jars to all cluster nodes? because none of the above is working, and the job still somehow does not get to reference the class, as I keep getting the same error:

java.lang.NoClassDefFoundError: org/apache/poi/ss/usermodel/WorkbookFactory

相同的命令作品有--master当地的,不指定--jars,因为我抄我的罐子到/ opt / Cloudera的/包裹/ CDH / lib中/火花/ lib目录。

The same command works with "--master local", without specifying the --jars, as I have copied my jars to /opt/cloudera/parcels/CDH/lib/spark/lib.

不过纱线群集模式,我需要外部罐子分发到所有群集,但上面code不起作用。

However for yarn-cluster mode, I would need to distribute the external jars to all cluster, but the above code does not work.

鸭preciate您的帮助,谢谢。

Appreciate your help, thanks.

P.S。我使用CDH5.4.2火花1.3.0

p.s. I am using CDH5.4.2 with spark 1.3.0

推荐答案

据以帮助星火选项提交


  • - 罐子包括本地罐子包括司机和执行人的classpath。 [它只是设置的路径]

  • --jars includes the local jars to include on the driver and executor classpaths. [it will just set the path]

---文件您器件的应用运行到执行人节点的所有的工作目录[它将运输您的罐子结果将需要复制的罐子
工作目录]

---files will copy the jars needed for you appication to run to all the working dir of executor nodes [it will transport your jar to
working dir]

注意:这类似于-file在Hadoop的数据流,运的映射/减速脚本到从属节点选项

Note: This is similar to -file options in hadoop streaming , which transports the mapper/reducer scripts to slave nodes.

因此​​,与--files选项尝试为好。

So try with --files options as well.

$ spark-submit --help
Options:
  --jars JARS                 Comma-separated list of local jars to include on the driver
                              and executor classpaths.
  --files FILES               Comma-separated list of files to be placed in the working
                              directory of each executor.

希望这有助于

这篇关于火花提交纱线集群--jars不起作用?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆