Spark Shell - __spark_libs__.zip不存在 [英] Spark Shell - __spark_libs__.zip does not exist

查看:709
本文介绍了Spark Shell - __spark_libs__.zip不存在的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是Spark的新手,并且正忙着设置启用HA的Spark Cluster。

当通过以下命令启动一个Spark shell进行测试时: bash spark-shell --master yarn --deploy-mode client



我收到以下错误消息(请参阅完整错误信息): file:/ tmp / spark-126d2844-5b37-461b-98a4-3f3de5ece91b /__spark_libs__3045590511279655158.zip不存在



该应用程序在yarn web应用程序中被标记为失败,并且没有启动容器。



当通过以下命令启动一个shell时: spark-shell --master local 它打开时没有错误。



我注意到,文件只被写入到创建shell的节点上的tmp文件夹中。



任何帮助都将非常感谢。让我知道是否需要更多信息。



环境变量


HADOOP_CONF_DIR = / opt / hadoop-2.7.3 / etc / hadoop /



YARN_CONF_DIR = / opt / hadoop-2.7.3 / etc / hadoop /



SPARK_HOME = / opt / spark-2.0.2-bin-hadoop2.7 /


完整的错误信息:

  16/11/30 21:08:47 WARN util.NativeCodeLoader:无法加载native-hadoop库用于您的平台...在适用的情况下使用内置java类
16/11/30 21:08:49 WARN yarn.Client:既不设置spark.yarn.jars也不设置spark.yarn.archive ,重新下载到SPARK_HOME下的库。
16/11/30 21:09:03 WARN cluster.YarnSchedulerBackend $ YarnSchedulerEndpoint:容器标记为失败:container_e14_1480532715390_0001_02_000003在主机slave2上。退出状态:-1000。诊断:文件文件:/ tmp目录/火花126d2844-5b37-461b-98a4-3f3de5ece91b / __ spark_libs__3045590511279655158.zip不存在
java.io.FileNotFoundException:文件文件:/ tmp目录/火花126d2844-5b37-461b- 98a4-3f3de5ece91b / __ spark_libs__3045590511279655158.zip
不会在org.apache.hadoop.fs在org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:611)
存在
。 RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:824)
在org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:601)
在org.apache.hadoop.fs.FilterFileSystem.getFileStatus( FilterFileSystem.java:421)
at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:253)
at org.apache.hadoop.yarn.util.FSDownload.access $ 000 (FSDownload.java:63)
at org.apache.hadoop.yarn.util.FSDownload $ 2.run(FSDownload.java:361)
at org.apache.hadoop.yarn.util.FSDownload $ 2 。跑(FSDownload.java:359)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org .apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:358)
at org.apache .hadoop.yarn.util.FSDownload.call(FSDownload.java:62)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors $ RunnableAdapter.call(Executors.java:511)
在java.util.concurrent.FutureTask.run(FutureTask.java:266)
在java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java :1142)
at java.util.concurrent.ThreadPoolExecutor $ Worker.run(ThreadPoolExecutor.java:617)$ b $ at java.lang.Thread.run(Thread.java:745)

16/11/30 22:29:28错误cluster.YarnClientSchedulerBackend:纱线应用程序有准备退出状态完成! 16/11/30 22:29:28错误spark.SparkContext:初始化SparkContext时出错。
在org.apache.spark.scheduler.TaskSchedulerImpl java.lang.IllegalStateException:而在org.apache.spark.scheduler.TaskSchedulerImpl.waitBackendReady(584 TaskSchedulerImpl.scala)等待后端
星火方面停止.postStartHook(TaskSchedulerImpl.scala:162)
at org.apache.spark.SparkContext。< init>(SparkContext.scala:546)
at org.apache.spark.SparkContext $ .getOrCreate(SparkContext .scala:2258)
at org.apache.spark.sql.SparkSession $ Builder $$ anonfun $ 8.apply(SparkSession.scala:831)
at org.apache.spark.sql.SparkSession $ Builder $ anonfun $ 8.apply(SparkSession.scala:823)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession $ Builder.getOrCreate(SparkSession .scala:823)
at org.apache.spark.repl.Main $ .createSparkSession(Main.scala:95)
at $ line3。$ read $$ iw $$ iw。< init> (< console>:15)
at $ line3。$ read $$ iw。< init> ;(< console>:31)
at $ line3。$ read。< init>(< console> 33)
at $ line3。$ read $。< init>(< ; console>:37)
at $ line3。$ read $。< clinit>(< console>)
at $ line3。$ eval $。$ print $ lzycompute(< console> ;: 7)
at $ line3 $ eval $。$ print(< console>:6)
at $ line3。$ eval。$ print(< console>)
at sun。 reflect.NativeMethodAccessorImpl.invoke0(本机方法)
在sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
在sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
在java.lang.reflect.Method.invoke(Method.java:498)
在scala.tools.nsc.interpreter.IMain $ ReadEvalPrint.call(IMain.scala:786)
在scala。 tools.nsc.interpreter.IMain $ Request.loadAndRun(IMain.scala:1047)在
$ scala.tools.nsc.interpreter.IMain $$ WrappedRequest $ anonfun $ loadAndRunReq 1.适用(IMain.sca LA:638)
在scala.tools.nsc.interpreter.IMain $ WrappedRequest $$ anonfun $ loadAndRunReq $ 1.适用(IMain.scala:637)
。在scala.reflect.internal.util.ScalaClassLoader $ class.asContext(ScalaClassLoader.scala:31)
。在scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:19)
。在scala.tools.nsc.interpreter.IMain $ WrappedRequest。 loadAndRunReq(IMain.scala:637)
at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:569)
at scala.tools.nsc.interpreter.IMain.interpret(IMain。斯卡拉:565)
在scala.tools.nsc.interpreter.ILoop.interpretStartingWith(ILoop.scala:807)
在scala.tools.nsc.interpreter.ILoop.command(ILoop.scala:681)
at scala.tools.nsc.interpreter.ILoop.processLine(ILoop.scala:395)
at org.apache.spark.repl.SparkILoop $$ anonfun $ initializeSpark $ 1.apply $ mcV $ sp( SparkILoop.scala:38)
在org.apache.spark.repl.SparkILoop $$ anonfun $ in itializeSpark $ 1.apply(SparkILoop.scala:37)
at org.apache.spark.repl.SparkILoop $$ anonfun $ initializeSpark $ 1.apply(SparkILoop.scala:37)
at scala.tools.nsc .interpreter.IMain.beQuietDuring(IMain.scala:214)
at org.apache.spark.repl.SparkILoop.initializeSpark(SparkILoop.scala:37)
at org.apache.spark.repl.SparkILoop .loadFiles(SparkILoop.scala:94)
at scala.tools.nsc.interpreter.iloop $$ anonfun $ process $ 1.apply $ mcZ $ sp(ILoop.scala:920)
at scala.tools .nsc.interpreter.iloop $$ anonfun $ process $ 1.apply(ILoop.scala:909)
at scala.tools.nsc.interpreter.iloop $$ anonfun $ process $ 1.apply(ILoop.scala:909)
at scala.reflect.internal.util.ScalaClassLoader $ .savingContextLoader(ScalaClassLoader.scala:97)
at scala.tools.nsc.interpreter.ILoop.process(ILoop.scala:909)
在org.apache.spark.repl.Main $ .doMain(Main.scala:68)
在org.apache.spark.repl.Main $ .main(Main.scala:51)
org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)$ b $ at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java: 62)
。在sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
。在java.lang.reflect.Method.invoke(Method.java:498)
。在org.apache .spark.deploy.SparkSubmit $ .org $ apache $ spark $ deploy $ SparkSubmit $$ runMain(SparkSubmit.scala:736)
at org.apache.spark.deploy.SparkSubmit $ .doRunMain $ 1(SparkSubmit.scala:
at org.apache.spark.deploy.SparkSubmit $ .submit(SparkSubmit.scala:210)
at org.apache.spark.deploy.SparkSubmit $ .main(SparkSubmit.scala:124)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

yarn-site.xml

 < configuration> 
<属性>
< name> yarn.resourcemanager.connect.retry-interval.ms< / name>
<值> 2000< /值>
< / property>
<属性>
< name> yarn.resourcemanager.ha.enabled< / name>
<值> true< /值>
< / property>
<属性>
< name> yarn.resourcemanager.ha.automatic-failover.enabled< / name>
<值> true< /值>
< / property>
<属性>
< name> yarn.resourcemanager.ha.automatic-failover.embedded< / name>
<值> true< /值>
< / property>
<属性>
< name> yarn.resourcemanager.cluster-id< / name>
< value> yarn-cluster< / value>
< / property>
<属性>
< name> yarn.resourcemanager.ha.rm-ids< / name>
<值> rm1,rm2< /值>
< / property>
<属性>
< name> yarn.resourcemanager.ha.id< / name>
<值> rm1< /值>
< / property>
<属性>
< name> yarn.resourcemanager.scheduler.class< / name>
< value> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler< / value>
< / property>
<属性>
< name> yarn.resourcemanager.recovery.enabled< / name>
<值> true< /值>
< / property>
<属性>
< name> yarn.resourcemanager.store.class< / name>
< value> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore< / value>
< / property>
<属性>
< name> yarn.resourcemanager.zk-address< / name>
<值>主控:2181,slave1:2181,slave2:2181< /值>
< / property>
<属性>
< name> yarn.app.mapreduce.am.scheduler.connection.wait.interval-ms< / name>
<值> 5000< /值>
< / property>
<属性>
< name> yarn.resourcemanager.work-preserving-recovery.enabled< / name>
<值> true< /值>
< / property>

<属性>
< name> yarn.resourcemanager.address.rm1< / name>
< value> master:23140< / value>
< / property>
<属性>
< name> yarn.resourcemanager.scheduler.address.rm1< / name>
<值>主控:23130< /值>
< / property>
<属性>
< name> yarn.resourcemanager.webapp.https.address.rm1< / name>
< value> master:23189< / value>
< / property>
<属性>
< name> yarn.resourcemanager.webapp.address.rm1< / name>
< value> master:23188< / value>
< / property>
<属性>
< name> yarn.resourcemanager.resource-tracker.address.rm1< / name>
< value> master:23125< / value>
< / property>
<属性>
< name> yarn.resourcemanager.admin.address.rm1< / name>
< value> master:23141< / value>
< / property>

<属性>
< name> yarn.resourcemanager.address.rm2< / name>
<值> slave1:23140< /值>
< / property>
<属性>
< name> yarn.resourcemanager.scheduler.address.rm2< / name>
<值> slave1:23130< /值>
< / property>
<属性>
< name> yarn.resourcemanager.webapp.https.address.rm2< / name>
<值> slave1:23189< /值>
< / property>
<属性>
< name> yarn.resourcemanager.webapp.address.rm2< / name>
<值> slave1:23188< /值>
< / property>
<属性>
< name> yarn.resourcemanager.resource-tracker.address.rm2< / name>
<值> slave1:23125< /值>
< / property>
<属性>
< name> yarn.resourcemanager.admin.address.rm2< / name>
<值> slave1:23141< /值>
< / property>

<属性>
< description>定位器IPC的地址。< / description>
< name> yarn.nodemanager.localizer.address< / name>
< value> 0.0.0.0:23344< /值>
< / property>
<属性>
< description> NM Webapp地址。< / description>
< name> yarn.nodemanager.webapp.address< / name>
< value> 0.0.0.0:23999< / value>
< / property>
<属性>
< name> yarn.nodemanager.aux-services< / name>
< value> mapreduce_shuffle< /值>
< / property>
<属性>
< name> yarn.nodemanager.local-dirs< / name>
< value> / tmp / pseudo-dist / yarn / local< / value>
< / property>
<属性>
< name> yarn.nodemanager.log-dirs< / name>
< value> / tmp / pseudo-dist / yarn / log< / value>
< / property>
<属性>
< name> mapreduce.shuffle.port< / name>
<值> 23080< /值>
< / property>
<属性>
< name> yarn.resourcemanager.work-preserving-recovery.enabled< / name>
<值> true< /值>
< / property>
< / configuration>


解决方案

这个错误是由于core- site.xml文件。


请注意,要找到这个文件,您的 HADOOP_CONF_DIR env变量
必须设置。

在我的例子中,我添加了 HADOOP_CONF_DIR = / opt / hadoop-2.7.3 / etc / hadoop /
./ conf / spark-env.sh



请参阅:Spark工作对成纱集群java.io.FileNotFoundException运行:文件不不退出,即使文件在主节点上退出

core-site.xml

 <配置> 
<属性>
<名称> fs.default.name< /名称>
< value> hdfs:// master:9000< / value>
< / property>
< / configuration>

如果此端点无法访问,或者Spark检测到文件系统与当前系统相同,lib文件将不会分发到群集中的其他节点,从而导致上述错误。

在我的情况下,我所在的节点无法到达指定主机上的端口9000。



调试



将日志级别转为info。你可以这样做:
$ b


  1. 复制 ./ conf / log4j.properties.template ./conf/log4j.properties


  2. 在文件集<$ c $ b>



  3. 正常启动Spark Shell。如果您的问题与我的问题相同,您应该看到一条信息消息,如: INFO客户端:源文件系统和目标文件系统是相同的。不复制文件:/ tmp / spark-c1a6cdcd-d348-4253-8755-5086a8931e75 / __ spark_libs__1391186608525933727.zip



    这应该引导您问题,因为它开始了由于丢失文件而导致的火车反应。

    I'm new to Spark and I'm busy setting up a Spark Cluster with HA enabled.

    When starting a spark shell for testing via: bash spark-shell --master yarn --deploy-mode client

    I receive the following error (See full error bellow): file:/tmp/spark-126d2844-5b37-461b-98a4-3f3de5ece91b/__spark_libs__3045590511279655158.zip does not exist

    The application is marked as failed on the yarn web app and no containers are started.

    When starting a shell via: spark-shell --master local it opens without errors.

    I have noticed that files are only being written to the tmp folder on the node where the shell is created.

    Any help will be much appreciated. Let me know if more information is required.

    Environment Variables:

    HADOOP_CONF_DIR=/opt/hadoop-2.7.3/etc/hadoop/

    YARN_CONF_DIR=/opt/hadoop-2.7.3/etc/hadoop/

    SPARK_HOME=/opt/spark-2.0.2-bin-hadoop2.7/

    Full error message:

    16/11/30 21:08:47 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 
    16/11/30 21:08:49 WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME. 
    16/11/30 21:09:03 WARN cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Container marked as failed: container_e14_1480532715390_0001_02_000003 on host: slave2. Exit status: -1000. Diagnostics: File file:/tmp/spark-126d2844-5b37-461b-98a4-3f3de5ece91b/__spark_libs__3045590511279655158.zip does not exist 
    java.io.FileNotFoundException: File file:/tmp/spark-126d2844-5b37-461b-98a4-3f3de5ece91b/__spark_libs__3045590511279655158.zip
    does not exist
            at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:611)
            at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:824)
            at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:601)
            at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:421)
            at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:253)
            at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:63)
            at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:361)
            at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359)
            at java.security.AccessController.doPrivileged(Native Method)
            at javax.security.auth.Subject.doAs(Subject.java:422)
            at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
            at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:358)
            at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62)
            at java.util.concurrent.FutureTask.run(FutureTask.java:266)
            at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
            at java.util.concurrent.FutureTask.run(FutureTask.java:266)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
            at java.lang.Thread.run(Thread.java:745)
    
    16/11/30 22:29:28 ERROR cluster.YarnClientSchedulerBackend: Yarn application has already exited with state FINISHED! 16/11/30 22:29:28 ERROR spark.SparkContext: Error initializing SparkContext. java.lang.IllegalStateException: Spark context stopped while waiting for backend
            at org.apache.spark.scheduler.TaskSchedulerImpl.waitBackendReady(TaskSchedulerImpl.scala:584)
            at org.apache.spark.scheduler.TaskSchedulerImpl.postStartHook(TaskSchedulerImpl.scala:162)
            at org.apache.spark.SparkContext.<init>(SparkContext.scala:546)
            at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2258)
            at org.apache.spark.sql.SparkSession$Builder$$anonfun$8.apply(SparkSession.scala:831)
            at org.apache.spark.sql.SparkSession$Builder$$anonfun$8.apply(SparkSession.scala:823)
            at scala.Option.getOrElse(Option.scala:121)
            at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:823)
            at org.apache.spark.repl.Main$.createSparkSession(Main.scala:95)
            at $line3.$read$$iw$$iw.<init>(<console>:15)
            at $line3.$read$$iw.<init>(<console>:31)
            at $line3.$read.<init>(<console>:33)
            at $line3.$read$.<init>(<console>:37)
            at $line3.$read$.<clinit>(<console>)
            at $line3.$eval$.$print$lzycompute(<console>:7)
            at $line3.$eval$.$print(<console>:6)
            at $line3.$eval.$print(<console>)
            at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
            at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
            at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
            at java.lang.reflect.Method.invoke(Method.java:498)
            at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:786)
            at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1047)
            at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:638)
            at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:637)
            at scala.reflect.internal.util.ScalaClassLoader$class.asContext(ScalaClassLoader.scala:31)
            at scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:19)
            at scala.tools.nsc.interpreter.IMain$WrappedRequest.loadAndRunReq(IMain.scala:637)
            at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:569)
            at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:565)
            at scala.tools.nsc.interpreter.ILoop.interpretStartingWith(ILoop.scala:807)
            at scala.tools.nsc.interpreter.ILoop.command(ILoop.scala:681)
            at scala.tools.nsc.interpreter.ILoop.processLine(ILoop.scala:395)
            at org.apache.spark.repl.SparkILoop$$anonfun$initializeSpark$1.apply$mcV$sp(SparkILoop.scala:38)
            at org.apache.spark.repl.SparkILoop$$anonfun$initializeSpark$1.apply(SparkILoop.scala:37)
            at org.apache.spark.repl.SparkILoop$$anonfun$initializeSpark$1.apply(SparkILoop.scala:37)
            at scala.tools.nsc.interpreter.IMain.beQuietDuring(IMain.scala:214)
            at org.apache.spark.repl.SparkILoop.initializeSpark(SparkILoop.scala:37)
            at org.apache.spark.repl.SparkILoop.loadFiles(SparkILoop.scala:94)
            at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply$mcZ$sp(ILoop.scala:920)
            at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:909)
            at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:909)
            at scala.reflect.internal.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:97)
            at scala.tools.nsc.interpreter.ILoop.process(ILoop.scala:909)
            at org.apache.spark.repl.Main$.doMain(Main.scala:68)
            at org.apache.spark.repl.Main$.main(Main.scala:51)
            at org.apache.spark.repl.Main.main(Main.scala)
            at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
            at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
            at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
            at java.lang.reflect.Method.invoke(Method.java:498)
            at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:736)
            at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
            at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
            at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
            at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
    

    yarn-site.xml

    <configuration>
      <property>
        <name>yarn.resourcemanager.connect.retry-interval.ms</name>
        <value>2000</value>
      </property>
      <property>
        <name>yarn.resourcemanager.ha.enabled</name>
        <value>true</value>
      </property>
      <property>
        <name>yarn.resourcemanager.ha.automatic-failover.enabled</name>
        <value>true</value>
      </property>
      <property>
        <name>yarn.resourcemanager.ha.automatic-failover.embedded</name>
        <value>true</value>
      </property>
      <property>
        <name>yarn.resourcemanager.cluster-id</name>
        <value>yarn-cluster</value>
      </property>
      <property>
        <name>yarn.resourcemanager.ha.rm-ids</name>
        <value>rm1,rm2</value>
      </property>
      <property>
        <name>yarn.resourcemanager.ha.id</name>
        <value>rm1</value>
      </property>
      <property>
        <name>yarn.resourcemanager.scheduler.class</name>
        <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
      </property>
      <property>
        <name>yarn.resourcemanager.recovery.enabled</name>
        <value>true</value>
      </property>
      <property>
        <name>yarn.resourcemanager.store.class</name>
        <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
      </property>
      <property>
        <name>yarn.resourcemanager.zk-address</name>
        <value>master:2181,slave1:2181,slave2:2181</value>
      </property>
      <property>
        <name>yarn.app.mapreduce.am.scheduler.connection.wait.interval-ms</name>
        <value>5000</value>
      </property>
      <property>
        <name>yarn.resourcemanager.work-preserving-recovery.enabled</name>
        <value>true</value>
      </property>
    
      <property>
        <name>yarn.resourcemanager.address.rm1</name>
        <value>master:23140</value>
      </property>
      <property>
        <name>yarn.resourcemanager.scheduler.address.rm1</name>
        <value>master:23130</value>
      </property>
      <property>
        <name>yarn.resourcemanager.webapp.https.address.rm1</name>
        <value>master:23189</value>
      </property>
      <property>
        <name>yarn.resourcemanager.webapp.address.rm1</name>
        <value>master:23188</value>
      </property>
      <property>
        <name>yarn.resourcemanager.resource-tracker.address.rm1</name>
        <value>master:23125</value>
      </property>
      <property>
        <name>yarn.resourcemanager.admin.address.rm1</name>
        <value>master:23141</value>
      </property>
    
      <property>
        <name>yarn.resourcemanager.address.rm2</name>
        <value>slave1:23140</value>
      </property>
      <property>
        <name>yarn.resourcemanager.scheduler.address.rm2</name>
        <value>slave1:23130</value>
      </property>
      <property>
        <name>yarn.resourcemanager.webapp.https.address.rm2</name>
        <value>slave1:23189</value>
      </property>
      <property>
        <name>yarn.resourcemanager.webapp.address.rm2</name>
        <value>slave1:23188</value>
      </property>
      <property>
        <name>yarn.resourcemanager.resource-tracker.address.rm2</name>
        <value>slave1:23125</value>
      </property>
      <property>
        <name>yarn.resourcemanager.admin.address.rm2</name>
        <value>slave1:23141</value>
      </property>
    
      <property>
        <description>Address where the localizer IPC is.</description>
        <name>yarn.nodemanager.localizer.address</name>
        <value>0.0.0.0:23344</value>
      </property>
      <property>
        <description>NM Webapp address.</description>
        <name>yarn.nodemanager.webapp.address</name>
        <value>0.0.0.0:23999</value>
      </property>
      <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
      </property>
      <property>
        <name>yarn.nodemanager.local-dirs</name>
        <value>/tmp/pseudo-dist/yarn/local</value>
      </property>
      <property>
        <name>yarn.nodemanager.log-dirs</name>
        <value>/tmp/pseudo-dist/yarn/log</value>
      </property>
      <property>
        <name>mapreduce.shuffle.port</name>
        <value>23080</value>
      </property>
      <property>
        <name>yarn.resourcemanager.work-preserving-recovery.enabled</name>
        <value>true</value>
      </property>
    </configuration>
    

    解决方案

    This error was due to the config in the core-site.xml file.

    Please note that to find this file your HADOOP_CONF_DIR env variable must be set.

    In my case I added HADOOP_CONF_DIR=/opt/hadoop-2.7.3/etc/hadoop/ to ./conf/spark-env.sh

    See: Spark Job running on Yarn Cluster java.io.FileNotFoundException: File does not exits , eventhough the file exits on the master node

    core-site.xml

    <configuration>
        <property>
            <name>fs.default.name</name>
            <value>hdfs://master:9000</value>
        </property> 
    </configuration>
    

    If this endpoint is unreachable, or if Spark detects that the file system is the same as the current system, the lib files will not be distributed to the other nodes in your cluster causing the errors above.

    In my situation the node I was on couldn't reach port 9000 on the specified host.

    Debugging

    Turn the log level up to info. You can do this by:

    1. Copy ./conf/log4j.properties.template to ./conf/log4j.properties

    2. In the file set log4j.logger.org.apache.spark.repl.Main = INFO

    Start your Spark Shell as normal. If your issue is the same as mine, you should see an info message such as: INFO Client: Source and destination file systems are the same. Not copying file:/tmp/spark-c1a6cdcd-d348-4253-8755-5086a8931e75/__spark_libs__1391186608525933727.zip

    This should lead you to the problem as it starts the train reaction that results from the missing files.

    这篇关于Spark Shell - __spark_libs__.zip不存在的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆