spark2 + yarn-准备AM容器时为nullpointerexception [英] spark2 + yarn - nullpointerexception while preparing AM container

查看:112
本文介绍了spark2 + yarn-准备AM容器时为nullpointerexception的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试

pyspark --master yarn

  • 火花版本:2.0.0
  • Hadoop版本:2.7.2
  • Hadoop纱线Web界面是 成功启动
    • Spark version: 2.0.0
    • Hadoop version: 2.7.2
    • Hadoop yarn web interface is successfully started
    • 会发生这种情况:

      16/08/15 10:00:12 DEBUG Client: Using the default MR application classpath: $HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*,$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*
      16/08/15 10:00:12 INFO Client: Preparing resources for our AM container
      16/08/15 10:00:12 DEBUG Client: 
      16/08/15 10:00:12 DEBUG DFSClient: /user/mispp/.sparkStaging/application_1471254869164_0006: masked=rwxr-xr-x
      16/08/15 10:00:12 DEBUG Client: IPC Client (1933573135) connection to sm/192.168.29.71:8020 from mispp sending #8
      16/08/15 10:00:12 DEBUG Client: IPC Client (1933573135) connection to sm/192.168.29.71:8020 from mispp got value #8
      16/08/15 10:00:12 DEBUG ProtobufRpcEngine: Call: mkdirs took 14ms
      16/08/15 10:00:12 DEBUG Client: IPC Client (1933573135) connection to sm/192.168.29.71:8020 from mispp sending #9
      16/08/15 10:00:12 DEBUG Client: IPC Client (1933573135) connection to sm/192.168.29.71:8020 from mispp got value #9
      16/08/15 10:00:12 DEBUG ProtobufRpcEngine: Call: setPermission took 10ms
      16/08/15 10:00:12 DEBUG Client: IPC Client (1933573135) connection to sm/192.168.29.71:8020 from mispp sending #10
      16/08/15 10:00:12 DEBUG Client: IPC Client (1933573135) connection to sm/192.168.29.71:8020 from mispp got value #10
      16/08/15 10:00:12 DEBUG ProtobufRpcEngine: Call: getFileInfo took 2ms
      16/08/15 10:00:12 INFO Client: Deleting staging directory hdfs://sm/user/mispp/.sparkStaging/application_1471254869164_0006
      16/08/15 10:00:12 DEBUG Client: IPC Client (1933573135) connection to sm/192.168.29.71:8020 from mispp sending #11
      16/08/15 10:00:12 DEBUG Client: IPC Client (1933573135) connection to sm/192.168.29.71:8020 from mispp got value #11
      16/08/15 10:00:12 DEBUG ProtobufRpcEngine: Call: delete took 14ms
      16/08/15 10:00:12 ERROR SparkContext: Error initializing SparkContext.
      java.lang.NullPointerException
              at scala.collection.mutable.ArrayOps$ofRef$.newBuilder$extension(ArrayOps.scala:190)
              at scala.collection.mutable.ArrayOps$ofRef.newBuilder(ArrayOps.scala:186)
              at scala.collection.TraversableLike$class.filterImpl(TraversableLike.scala:246)
              at scala.collection.TraversableLike$class.filter(TraversableLike.scala:259)
              at scala.collection.mutable.ArrayOps$ofRef.filter(ArrayOps.scala:186)
              at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$6.apply(Client.scala:484)
              at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$6.apply(Client.scala:480)
              at scala.collection.mutable.ArraySeq.foreach(ArraySeq.scala:74)
              at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:480)
              at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:834)
              at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:167)
              at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56)
              at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:149)
              at org.apache.spark.SparkContext.<init>(SparkContext.scala:500)
              at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
              at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
              at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
              at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
              at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
              at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:240)
              at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
              at py4j.Gateway.invoke(Gateway.java:236)
              at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
              at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
              at py4j.GatewayConnection.run(GatewayConnection.java:211)
              at java.lang.Thread.run(Thread.java:745)
      16/08/15 10:00:12 DEBUG AbstractLifeCycle: stopping org.spark_project.jetty.server.Server@69e507eb
      16/08/15 10:00:12 DEBUG Server: Graceful shutdown org.spark_project.jetty.server.Server@69e507eb by 
      

      yarn-site.xml: (最后一个属性是我在网上找到的,因此请尝试是否可行)

      yarn-site.xml: (the last property is something i found online so just tried if it would work)

      <configuration>
      
      <!-- Site specific YARN configuration properties -->
          <property>
              <name>yarn.nodemanager.aux-services</name>
              <value>mapreduce_shuffle</value>
          </property>
          <property>
              <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
              <value>org.apache.hadoop.mapred.ShuffleHandler</value>
          </property>
          <property>
              <name>yarn.resourcemanager.resource-tracker.address</name>
              <value>sm:8025</value>
          </property>
          <property>
              <name>yarn.resourcemanager.scheduler.address</name>
              <value>sm:8030</value>
          </property>
          <property>
              <name>yarn.resourcemanager.address</name>
              <value>sm:8050</value>
          </property>
          <property>
              <name>yarn.application.classpath</name>
              <value>/home/mispp/hadoop-2.7.2/share/hadoop/yarn</value>
          </property>
      </configuration>
      

      .bashrc:

      export HADOOP_PREFIX=/home/mispp/hadoop-2.7.2
      export PATH=$PATH:$HADOOP_PREFIX/bin
      export HADOOP_HOME=$HADOOP_PREFIX
      export HADOOP_COMMON_HOME=$HADOOP_PREFIX
      export HADOOP_YARN_HOME=$HADOOP_PREFIX
      export HADOOP_HDFS_HOME=$HADOOP_PREFIX
      export HADOOP_MAPRED_HOME=$HADOOP_PREFIX
      export HADOOP_CONF_DIR=$HADOOP_PREFIX/etc/hadoop
      export YARN_CONF_DIR=$HADOOP_PREFIX/etc/hadoop
      

      知道为什么会这样吗? 它是在3个LXD容器(主机+两个计算)中设置的,该服务器具有16GB内存.

      Any idea why this happens? It's set up in 3 LXD containers (master + two computes), on a server with 16GB ram.

      推荐答案

      给出Spark 2.0.0代码中错误的位置:

      Given the location of the error in the Spark 2.0.0 code:

      https://github.com/apache/spark/blob/v2.0.0/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala#L480

      我怀疑由于spark.yarn.jars的配置错误而发生了错误.根据

      I suspect that the error is happening because of a misconfiguration of spark.yarn.jars. I would double check that the value of this configuration in your setup is correct, according to the doc at http://spark.apache.org/docs/2.0.0/running-on-yarn.html#spark-properties.

      这篇关于spark2 + yarn-准备AM容器时为nullpointerexception的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆