自定义spark在纱线上运行时找不到蜂巢数据库 [英] custom spark does not find hive databases when running on yarn

查看:158
本文介绍了自定义spark在纱线上运行时找不到蜂巢数据库的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

按照但是,一个:

spark.sql("show databases").show

仅返回:

+------------+
|databaseName|
+------------+
|     default|
+------------+

现在尝试通过原始HDP配置(显然不是我的自定义版本的spark读取的),例如:

Now trying to pass the original HDP configuration (which is apparently not read in by my custom version of spark) like:

一个:

--files /usr/hdp/current/spark2-client/conf/hive-site.xml

两个:

--conf spark.hive.metastore.uris='thrift://master001.my.corp.com:9083,thrift://master002.my.corp.com:9083,thrift://master003.my.corp.com:9083' --conf spark.hive.metastore.sasl.enabled='true' --conf hive.metastore.uris='thrift://master001.my.corp.com:9083,thrift://master002.my.corp.com:9083,thrift://master003.my.corp.com:9083' --conf hive.metastore.sasl.enabled='true'

三个:

--conf spark.yarn.dist.files='/usr/hdp/current/spark2-client/conf/hive-site.xml'

四个:

--conf spark.sql.warehouse.dir='/apps/hive/warehouse'

all都无助于解决问题. 如何获得识别蜂巢数据库的火花?

all does not help to solve the issue. How can I get spark to recognize the hive databases?

推荐答案

Hive jar必须位于Spark的类路径中才能启用Hive支持. 如果配置路径中不存在配置单元罐,则目录实现 使用的是in-memory
在spark-shell中,我们可以通过执行

Hive jars need to be in the classpath of Spark for hive support to be enabled. if the hive jars are not present in classpath, the catalog implementation used is in-memory
In spark-shell we can confirm this by executing

sc.getConf.get("spark.sql.catalogImplementation") 

这将给出in-memory

    def enableHiveSupport(): Builder = synchronized {
      if (hiveClassesArePresent) {
        config(CATALOG_IMPLEMENTATION.key, "hive")
      } else {
        throw new IllegalArgumentException(
          "Unable to instantiate SparkSession with Hive support because " +
            "Hive classes are not found.")
      }
    }

SparkSession.scala

  private[spark] def hiveClassesArePresent: Boolean = {
    try {
      Utils.classForName(HIVE_SESSION_STATE_BUILDER_CLASS_NAME)
      Utils.classForName("org.apache.hadoop.hive.conf.HiveConf")
      true
    } catch {
      case _: ClassNotFoundException | _: NoClassDefFoundError => false
    }
  }

如果不存在这些类,则不会启用Hive支持. 链接到

If the classes are not present, Hive support is not enabled. Link to the code where the above checks happen as part of Spark shell initialization.

在上面作为问题粘贴的代码中,SPARK_DIST_CLASSPATH仅填充了Hadoop类路径和缺少Hive jar的路径.

In the above code pasted as part of question, SPARK_DIST_CLASSPATH is populated only with the Hadoop classpath and the paths to Hive jars missing.

这篇关于自定义spark在纱线上运行时找不到蜂巢数据库的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆