无法在 EMR 5.0 HUE 上实例化 SparkSession [英] Can't instantiate SparkSession on EMR 5.0 HUE

查看:20
本文介绍了无法在 EMR 5.0 HUE 上实例化 SparkSession的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在运行一个 EMR 5.0 集群,我正在使用 HUE 创建一个 OOZIE 工作流来提交 SPARK 2.0 作业.我直接在 YARN 上使用 spark-submit 运行了该作业,并作为同一集群上的一个步骤.没问题.但是,当我使用 HUE 执行此操作时,出现以下错误:

I'm running an EMR 5.0 cluster and I'm using HUE to create an OOZIE workflow to submit a SPARK 2.0 job. I have ran the job with a spark-submit directly on the YARN and as a step on the same cluster. No problem. But when I do it with HUE I get the following error:

java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.internal.SessionState':
    at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$reflect(SparkSession.scala:949)
    at org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:111)
    at org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:110)
    at org.apache.spark.sql.SparkSession.conf$lzycompute(SparkSession.scala:133)
    at org.apache.spark.sql.SparkSession.conf(SparkSession.scala:133)
    at org.apache.spark.sql.SparkSession$Builder$$anonfun$getOrCreate$5.apply(SparkSession.scala:838)
    at org.apache.spark.sql.SparkSession$Builder$$anonfun$getOrCreate$5.apply(SparkSession.scala:838)
    at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
    at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
    at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
    at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
    at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
    at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:838)
    at be.infofarm.App$.main(App.scala:22)
    at be.infofarm.App.main(App.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:627)
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$reflect(SparkSession.scala:946)
    ... 19 more
Caused by: java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.internal.SharedState':
    at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$reflect(SparkSession.scala:949)
    at org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:100)
    at org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:100)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.sql.SparkSession.sharedState$lzycompute(SparkSession.scala:99)
    at org.apache.spark.sql.SparkSession.sharedState(SparkSession.scala:98)
    at org.apache.spark.sql.internal.SessionState.<init>(SessionState.scala:153)
    ... 24 more
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$reflect(SparkSession.scala:946)
    ... 30 more
Caused by: java.lang.Exception: Could not find resource path for Web UI: org/apache/spark/sql/execution/ui/static
    at org.apache.spark.ui.JettyUtils$.createStaticHandler(JettyUtils.scala:182)
    at org.apache.spark.ui.WebUI.addStaticHandler(WebUI.scala:119)
    at org.apache.spark.sql.execution.ui.SQLTab.<init>(SQLTab.scala:32)
    at org.apache.spark.sql.internal.SharedState$$anonfun$createListenerAndUI$1.apply(SharedState.scala:96)
    at org.apache.spark.sql.internal.SharedState$$anonfun$createListenerAndUI$1.apply(SharedState.scala:96)
    at scala.Option.foreach(Option.scala:257)
    at org.apache.spark.sql.internal.SharedState.createListenerAndUI(SharedState.scala:96)
    at org.apache.spark.sql.internal.SharedState.<init>(SharedState.scala:44)
    ... 35 more

当我在 Spark 作业中不使用 spark.sql 或 SparkSession(而是使用 SparkContext)时,它运行良好.如果有人知道发生了什么,我将不胜感激.

When I don't use spark.sql or the SparkSession (instead I used SparkContext) in my Spark job it runs fine. If anyone has any clue what is going on I would be very grateful.

编辑 1

我的 Maven 程序集

My maven assembly

  <build>
<sourceDirectory>src/main/scala</sourceDirectory>
<testSourceDirectory>src/test/scala</testSourceDirectory>
<plugins>
  <plugin>
    <groupId>net.alchim31.maven</groupId>
    <artifactId>scala-maven-plugin</artifactId>
    <version>3.1.3</version>
    <executions>
      <execution>
        <goals>
          <goal>compile</goal>
          <goal>testCompile</goal>
        </goals>
        <configuration>
          <args>
            <arg>-dependencyfile</arg>
            <arg>${project.build.directory}/.scala_dependencies</arg>
          </args>
        </configuration>
      </execution>
    </executions>
  </plugin>

  <plugin>
    <artifactId>maven-assembly-plugin</artifactId>
    <configuration>
      <archive>
        <manifest>
          <mainClass>be.infofarm.App</mainClass>
        </manifest>
      </archive>
      <descriptorRefs>
        <descriptorRef>jar-with-dependencies</descriptorRef>
      </descriptorRefs>
    </configuration>
    <executions>
      <execution>
        <id>make-assembly</id> <!-- this is used for inheritance merges -->
        <phase>package</phase> <!-- bind to the packaging phase -->
        <goals>
          <goal>single</goal>
        </goals>
      </execution>
    </executions>
  </plugin>
</plugins>

推荐答案

当您使用 spark-submit 运行 jar 时,所有依赖的 jar 都在机器的类路径上可用,但是当您使用 oozie 执行相同的 jar 时,这些 jar 不可用Oozie 的共享库".您可以通过执行以下命令进行检查

when you run jar with spark-submit all dependant jars are available on the classpath of the machine but when you execute the same using oozie those jars are not available in Oozie's 'sharelib'. you can check the same by executing following command

oozie admin -shareliblist spark

步骤 1. 从本地机器上传所需的 jars 到 HDFS

Step 1. Upload required jars from local machine to HDFS

hdfs dfs -put /usr/lib/spark/jars/*.jar /user/oozie/share/lib/lib_timestamp/spark/ 

只是将 jars 上传到 HDFS 不会将它们添加到 sharelib 您需要通过执行

just uploading jars to HDFS won't add them to sharelib you need to update sharelib by executing

oozie admin -sharelibupdate

希望能帮到你

这篇关于无法在 EMR 5.0 HUE 上实例化 SparkSession的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆