将Spark设置为Hive的默认执行引擎 [英] Setting Spark as default execution engine for Hive

查看:96
本文介绍了将Spark设置为Hive的默认执行引擎的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Hadoop 2.7.3,Spark 2.1.0和Hive 2.1.1

Hadoop 2.7.3, Spark 2.1.0 and Hive 2.1.1.

我正在尝试将spark设置为蜂巢的默认执行引擎.我将$ SPARK_HOME/jars中的所有jars上传到hdfs文件夹,并将scala库,spark-core和spark-network-common的jars复制到HIVE_HOME/lib.然后,我配置了具有以下属性的hive-site.xml:

I am trying to set spark as default execution engine for hive. I uploaded all jars in $SPARK_HOME/jars to hdfs folder and copied scala-library, spark-core, and spark-network-common jars to HIVE_HOME/lib. Then I configured hive-site.xml with the following properties:

  <property>
    <name>hive.execution.engine</name>
    <value>spark</value>
  </property>
  <property>
    <name>spark.master</name>
    <value>spark://master:7077</value>
    <description>Spark Master URL</description>
  </property>
  <property>
    <name>spark.eventLog.enabled</name>
    <value>true</value>
    <description>Spark Event Log</description>
  </property>
  <property>
    <name>spark.eventLog.dir</name>
    <value>hdfs://master:8020/user/spark/eventLogging</value>
    <description>Spark event log folder</description>
  </property>
  <property>
    <name>spark.executor.memory</name>
    <value>512m</value>
    <description>Spark executor memory</description>
  </property>
  <property>
    <name>spark.serializer</name>
    <value>org.apache.spark.serializer.KryoSerializer</value>
    <description>Spark serializer</description>
  </property>
  <property>
  <name>spark.yarn.jars</name>
  <value>hdfs://master:8020/user/spark/spark-jars/*</value>
</property>

在蜂巢壳中,我执行了以下操作:

In hive shell, I did the following:

hive> add jar ${env:HIVE_HOME}/lib/scala-library-2.11.8.jar;
Added [/usr/local/hive/hive-2.1.1/lib/scala-library-2.11.8.jar] to class path
Added resources: [/usr/local/hive/hive-2.1.1/lib/scala-library-2.11.8.jar]
hive> add jar ${env:HIVE_HOME}/lib/spark-core_2.11-2.1.0.jar;
Added [/usr/local/hive/hive-2.1.1/lib/spark-core_2.11-2.1.0.jar] to class path
Added resources: [/usr/local/hive/hive-2.1.1/lib/spark-core_2.11-2.1.0.jar]
hive> add jar ${env:HIVE_HOME}/lib/spark-network-common_2.11-2.1.0.jar;
Added [/usr/local/hive/hive-2.1.1/lib/spark-network-common_2.11-2.1.0.jar] to class path
Added resources: [/usr/local/hive/hive-2.1.1/lib/spark-network-common_2.11-2.1.0.jar]
hive> set hive.execution.engine=spark;

当我尝试执行

hive>从tableName中选择count(*);

hive> select count(*) from tableName;

我得到了以下内容:

Query ID = hduser_20170130230014_6e23dacc-78e8-4bd6-9fad-1344f6d0569e
Total jobs = 1
Launching Job 1 out of 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Failed to execute spark task, with exception 'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create spark client.)'
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.spark.SparkTask

Hive日志显示java.lang.NoClassDefFoundError: org/apache/spark/JavaSparkListener

ERROR [main] client.SparkClientImpl: Error while waiting for client to connect.
java.util.concurrent.ExecutionException: java.lang.RuntimeException: Cancel client 'cc10915b-da97-4fd7-9960-49c03ea380d7'. Error: Child process exited before connecting back with error log Warning: Ignoring non-spark config property: hive.spark.client.server.connect.timeout=90000
Warning: Ignoring non-spark config property: hive.spark.client.rpc.threads=8
Warning: Ignoring non-spark config property: hive.spark.client.connect.timeout=1000
Warning: Ignoring non-spark config property: hive.spark.client.secret.bits=256
Warning: Ignoring non-spark config property: hive.spark.client.rpc.max.size=52428800
java.lang.NoClassDefFoundError: org/apache/spark/JavaSparkListener
    at java.lang.ClassLoader.defineClass1(Native Method)
    at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
    at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
    at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
    at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:348)
    at org.apache.spark.util.Utils$.classForName(Utils.scala:229)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:695)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.JavaSparkListener
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    ... 19 more

    at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:37)
    at org.apache.hive.spark.client.SparkClientImpl.<init>(SparkClientImpl.java:106)
    at org.apache.hive.spark.client.SparkClientFactory.createClient(SparkClientFactory.java:80)
    at org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.createRemoteClient(RemoteHiveSparkClient.java:99)
    at org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.<init>(RemoteHiveSparkClient.java:95)
    at org.apache.hadoop.hive.ql.exec.spark.HiveSparkClientFactory.createHiveSparkClient(HiveSparkClientFactory.java:69)
    at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:62)
    at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionManagerImpl.getSession(SparkSessionManagerImpl.java:114)
    at org.apache.hadoop.hive.ql.exec.spark.SparkUtilities.getSparkSession(SparkUtilities.java:136)
    at org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:89)
    at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197)
    at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
    at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2073)
    at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1744)
    at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1453)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1171)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1161)
    at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:232)
    at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183)
    at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399)
    at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:776)
    at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:714)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.lang.RuntimeException: Cancel client 'cc10915b-da97-4fd7-9960-49c03ea380d7'. Error: Child process exited before connecting back with error log Warning: Ignoring non-spark config property: hive.spark.client.server.connect.timeout=90000
Warning: Ignoring non-spark config property: hive.spark.client.rpc.threads=8
Warning: Ignoring non-spark config property: hive.spark.client.connect.timeout=1000
Warning: Ignoring non-spark config property: hive.spark.client.secret.bits=256
Warning: Ignoring non-spark config property: hive.spark.client.rpc.max.size=52428800
java.lang.NoClassDefFoundError: org/apache/spark/JavaSparkListener
    at java.lang.ClassLoader.defineClass1(Native Method)
    at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
    at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
    at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
    at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:348)
    at org.apache.spark.util.Utils$.classForName(Utils.scala:229)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:695)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.JavaSparkListener
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    ... 19 more

请帮助我将Hive 2.1.1集成到Spark 2.1.0中.

Please help me to integrate Hive 2.1.1 on Spark 2.1.0.

推荐答案

这是Spark中的错误,org.apache.spark.JavaSparkListener类已从Spark 2.0.0中删除.它已得到修复,正在审核中.如果该修补程序获得批准,则它将在下一个Spark(可能是Spark 2.2.0)中可用

This is a bug in Spark, org.apache.spark.JavaSparkListener class has been removed from Spark 2.0.0. It has been fixed and in review process. If the fix will be approved then it will be available in the next Spark (could be Spark 2.2.0)

https://issues.apache.org/jira/browse/SPARK-17563

这篇关于将Spark设置为Hive的默认执行引擎的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆