在带有配置单元的无头模式下在 HDP 3.1 上触发 3.x - 找不到配置单元表 [英] spark 3.x on HDP 3.1 in headless mode with hive - hive tables not found

查看:25
本文介绍了在带有配置单元的无头模式下在 HDP 3.1 上触发 3.x - 找不到配置单元表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何使用无头 (https) 在 HDP 3.1 上配置 Spark 3.x://spark.apache.org/docs/latest/hadoop-provided.html) 与 hive 交互的 spark 版本?

How can I configure Spark 3.x on HDP 3.1 using headless (https://spark.apache.org/docs/latest/hadoop-provided.html) version of spark to interact with hive?

首先,我已经下载并解压了 headless spark 3.x:

First, I have downloaded and unzipped the headless spark 3.x:

cd ~/development/software/spark-3.0.0-bin-without-hadoop
export HADOOP_CONF_DIR=/etc/hadoop/conf/
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk
export SPARK_DIST_CLASSPATH=$(hadoop --config /usr/hdp/current/spark2-client/conf classpath)
 
ls /usr/hdp # note version ad add it below and replace 3.1.x.x-xxx with it

./bin/spark-shell --master yarn --queue myqueue --conf spark.driver.extraJavaOptions='-Dhdp.version=3.1.x.x-xxx' --conf spark.yarn.am.extraJavaOptions='-Dhdp.version=3.1.x.x-xxx' --conf spark.hadoop.metastore.catalog.default=hive --files /usr/hdp/current/hive-client/conf/hive-site.xml

spark.sql("show databases").show
// only showing default namespace, existing hive tables are missing
+---------+
|namespace|
+---------+
|  default|
+---------+

spark.conf.get("spark.sql.catalogImplementation")
res2: String = in-memory # I want to see hive here - how? How to add hive jars onto the classpath?

注意

这是如何在 HDP 上的自定义版本中以无头模式运行 spark? 对于 Spark 3.x 和 HDP 3.1 和 自定义 spark 在 yarn 上运行时找不到 hive 数据库.

NOTE

This is an updated version of How can I run spark in headless mode in my custom version on HDP? for Spark 3.x ond HDP 3.1 and custom spark does not find hive databases when running on yarn.

此外:我知道 spark 中 ACID hive 表的问题.现在,我只想能够看到现有的数据库

Furthermore: I am aware of the problems of ACID hive tables in spark. For now, I simply want to be able to see the existing databases

我们必须将 hive jar 放到类路径上.尝试如下:

We must get the hive jars onto the class path. Trying as follows:

 export SPARK_DIST_CLASSPATH="/usr/hdp/current/hive-client/lib*:${SPARK_DIST_CLASSPATH}"

现在使用 spark-sql:

And now using spark-sql:

./bin/spark-sql --master yarn --queue myqueue--conf spark.driver.extraJavaOptions='-Dhdp.version=3.1.x.x-xxx' --conf spark.yarn.am.extraJavaOptions='-Dhdp.version=3.1.x.x-xxx' --conf spark.hadoop.metastore.catalog.default=hive --files /usr/hdp/current/hive-client/conf/hive-site.xml

失败:

Error: Failed to load class org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.
Failed to load main class org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.

即行:export SPARK_DIST_CLASSPATH=/usr/hdp/current/hive-client/lib*:${SPARK_DIST_CLASSPATH}",没有效果(如果没有设置同样的问题).

I.e. the line: export SPARK_DIST_CLASSPATH="/usr/hdp/current/hive-client/lib*:${SPARK_DIST_CLASSPATH}", had no effect (same issue if not set).

推荐答案

如上所述和 自定义 spark 在 yarn 上运行时找不到 hive 数据库 需要 Hive JAR.它们在无头版本中不提供.

As noted above and custom spark does not find hive databases when running on yarn the Hive JARs are needed. They are not supplied in the headless version.

我无法改造这些.

解决方案:不用担心:只需在 Hadoop 3.2(在 HDP 3.1 上)使用 Spark 构建

Solution: instead of worrying: simply use the spark build with Hadoop 3.2 (on HDP 3.1)

这篇关于在带有配置单元的无头模式下在 HDP 3.1 上触发 3.x - 找不到配置单元表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆