在HDP 3.1上以无蜂巢模式在HDP 3.1上运行spark 3.x-找不到蜂巢表 [英] spark 3.x on HDP 3.1 in headless mode with hive - hive tables not found

查看:598
本文介绍了在HDP 3.1上以无蜂巢模式在HDP 3.1上运行spark 3.x-找不到蜂巢表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何使用无头( https可以与蜂巢互动的http://spark.apache.org/docs/latest/hadoop-provided.html )版本?

How can I configure Spark 3.x on HDP 3.1 using headless (https://spark.apache.org/docs/latest/hadoop-provided.html) version of spark to interact with hive?

首先,我下载并解压缩了无头星火3.x:

First, I have downloaded and unzipped the headless spark 3.x:

cd ~/development/software/spark-3.0.0-bin-without-hadoop
export HADOOP_CONF_DIR=/etc/hadoop/conf/
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk
export SPARK_DIST_CLASSPATH=$(hadoop --config /usr/hdp/current/spark2-client/conf classpath)
 
ls /usr/hdp # note version ad add it below and replace 3.1.x.x-xxx with it

./bin/spark-shell --master yarn --queue myqueue --conf spark.driver.extraJavaOptions='-Dhdp.version=3.1.x.x-xxx' --conf spark.yarn.am.extraJavaOptions='-Dhdp.version=3.1.x.x-xxx' --conf spark.hadoop.metastore.catalog.default=hive --files /usr/hdp/current/hive-client/conf/hive-site.xml

spark.sql("show databases").show
// only showing default namespace, existing hive tables are missing
+---------+
|namespace|
+---------+
|  default|
+---------+

spark.conf.get("spark.sql.catalogImplementation")
res2: String = in-memory # I want to see hive here - how? How to add hive jars onto the classpath?

注意

这是

NOTE

This is an updated version of How can I run spark in headless mode in my custom version on HDP? for Spark 3.x ond HDP 3.1 and custom spark does not find hive databases when running on yarn.

此外:我知道spark中的ACID蜂巢表存在问题.现在,我只希望能够查看现有数据库

Furthermore: I am aware of the problems of ACID hive tables in spark. For now, I simply want to be able to see the existing databases

我们必须将蜂巢罐子放在类路径上.尝试如下:

We must get the hive jars onto the class path. Trying as follows:

 export SPARK_DIST_CLASSPATH="/usr/hdp/current/hive-client/lib*:${SPARK_DIST_CLASSPATH}"

现在使用spark-sql:

And now using spark-sql:

./bin/spark-sql --master yarn --queue myqueue--conf spark.driver.extraJavaOptions='-Dhdp.version=3.1.x.x-xxx' --conf spark.yarn.am.extraJavaOptions='-Dhdp.version=3.1.x.x-xxx' --conf spark.hadoop.metastore.catalog.default=hive --files /usr/hdp/current/hive-client/conf/hive-site.xml

失败:

Error: Failed to load class org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.
Failed to load main class org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.

即该行:export SPARK_DIST_CLASSPATH="/usr/hdp/current/hive-client/lib*:${SPARK_DIST_CLASSPATH}",没有任何效果(如果未设置,则会出现同样的问题).

I.e. the line: export SPARK_DIST_CLASSPATH="/usr/hdp/current/hive-client/lib*:${SPARK_DIST_CLASSPATH}", had no effect (same issue if not set).

推荐答案

如上所述和

As noted above and custom spark does not find hive databases when running on yarn the Hive JARs are needed. They are not supplied in the headless version.

我无法改装这些.

解决方案:不用担心:只需在Hadoop 3.2(在HDP 3.1上)上使用spark版本

Solution: instead of worrying: simply use the spark build with Hadoop 3.2 (on HDP 3.1)

这篇关于在HDP 3.1上以无蜂巢模式在HDP 3.1上运行spark 3.x-找不到蜂巢表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆