在Azure HDIinsight群集中指定--files时,Spark提交在纱线群集模式下失败 [英] Spark submit failing in yarn cluster mode when specifying --files in an Azure HDIinsight cluster

查看:80
本文介绍了在Azure HDIinsight群集中指定--files时,Spark提交在纱线群集模式下失败的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在纱线簇模式下火花提交失败,但在客户端模式下成功提交

Spark submit in yarn cluster mode failing but its successful in client mode

火花提交:

spark-submit 
--master yarn --deploy-mode cluster \
--py-files packages.zip,deps2.zip \
--files /home/sshsanjeev/git/pyspark-example-demo/configs/etl_config.json \
jobs/etl_job.py

Error stack:

Traceback (most recent call last):
  File "etl_job.py", line 51, in <module>
    main()
  File "etl_job.py", line 11, in main
    app_name='my_etl_job',spark_config={'spark.sql.shuffle.partitions':2})
  File "/mnt/resource/hadoop/yarn/local/usercache/sshsanjeev/appcache/application_1555349704365_0218/container_1555349704365_0218_01_000001/packages.zip/dependencies/spark_conn.py", line 20, in start_spark
  File "/usr/hdp/current/spark2-client/python/pyspark/context.py", line 891, in addFile
    self._jsc.sc().addFile(path, recursive)
  File "/usr/hdp/current/spark2-client/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
  File "/usr/hdp/current/spark2-client/python/pyspark/sql/utils.py", line 63, in deco
    return f(*a, **kw)
  File "/usr/hdp/current/spark2-client/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o204.addFile.
: java.io.FileNotFoundException: File file:/mnt/resource/hadoop/yarn/local/usercache/sshsanjeev/appcache/application_1555349704365_0218/container_1555349704365_0218_01_000001/configs/etl_config.json does not exist
    at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:624)
    at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:850)
    at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:614)
    at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:422)
    at org.apache.spark.SparkContext.addFile(SparkContext.scala:1529)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:282)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:238)
    at java.lang.Thread.run(Thread.java:748)

进行了几次在线搜索.关注本文 https://community.cloudera.com/t5/support-questions/spark-job-fails-in-cluster-mode/td-p/58772 ,但问题仍未解决.

Did several online search. Followed this article https://community.cloudera.com/t5/Support-Questions/Spark-job-fails-in-cluster-mode/td-p/58772 but still the issue is not resolved.

请注意,我已经尝试了两种方法,方法是将配置文件放置在Namenode的本地路径以及HDFS目录中,但是仍然遇到相同的错误.同样在客户端模式下,它可以成功运行.需要指导

Please note that I have tried 2 approaches by placing in the config file in the local path of Namenode as well as in the HDFS directory but still getting the same error. Also in client mode this runs successfully. Need guidance

这是我的HDP群集的堆栈版本

Here is Stack version of my HDP cluster

HDP-2.6.5.3008纱2.7.3Spark2 2.3.2

HDP-2.6.5.3008 YARN 2.7.3 Spark2 2.3.2

让我知道是否需要进一步的信息.任何建议将不胜感激.

Let me know if further info is required. Any suggestions will be highly appreciated.

推荐答案

它可能与无法创建目录的权限问题有关.如果未创建目录,则它将没有占位符来放置中间结果.因此,它失败了.引用/mnt/resource/hadoop/yarn/local/usercache/< username>/appcache/< applicationID> 的目录用于存储中间结果,然后根据需要转到HDFS/内存是否将其写入路径还是分别存储在临时表中.用户可能没有权限.作业完成后,它就会被冲洗掉.在特定工作节点中的路径/mnt/resource/hadoop/yarn/local/user 缓存中向用户提供正确的权限,即可解决此问题.

It could be related to the permission issue which is not able to create the directory. If the directory is not getting created then it will not have a place holder to place the intermediate results. Hence it fails. The directory referred /mnt/resource/hadoop/yarn/local/usercache/<username>/appcache/<applicationID> is used to store the intermediate results and then it goes to HDFS/memory depending on whether it is written to a path or stored in temp tables respectively. The user might not have permission. Once the job finishes it gets flushed out. Providing correct permissions to the user in the path /mnt/resource/hadoop/yarn/local/user cache in the specific worker node should resolve the issue.

这篇关于在Azure HDIinsight群集中指定--files时,Spark提交在纱线群集模式下失败的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆