自定义spark在纱线上运行时找不到蜂巢数据库 [英] custom spark does not find hive databases when running on yarn
问题描述
按照但是,一个:
spark.sql("show databases").show
仅返回:
+------------+
|databaseName|
+------------+
| default|
+------------+
现在尝试通过原始HDP配置(显然不是我的自定义版本的spark读取的),例如:
Now trying to pass the original HDP configuration (which is apparently not read in by my custom version of spark) like:
一个:
--files /usr/hdp/current/spark2-client/conf/hive-site.xml
两个:
--conf spark.hive.metastore.uris='thrift://master001.my.corp.com:9083,thrift://master002.my.corp.com:9083,thrift://master003.my.corp.com:9083' --conf spark.hive.metastore.sasl.enabled='true' --conf hive.metastore.uris='thrift://master001.my.corp.com:9083,thrift://master002.my.corp.com:9083,thrift://master003.my.corp.com:9083' --conf hive.metastore.sasl.enabled='true'
三个:
--conf spark.yarn.dist.files='/usr/hdp/current/spark2-client/conf/hive-site.xml'
四个:
--conf spark.sql.warehouse.dir='/apps/hive/warehouse'
all都无助于解决问题. 如何获得识别蜂巢数据库的火花?
all does not help to solve the issue. How can I get spark to recognize the hive databases?
推荐答案
Hive jar必须位于Spark的类路径中才能启用Hive支持.
如果配置路径中不存在配置单元罐,则目录实现
使用的是in-memory
在spark-shell中,我们可以通过执行
Hive jars need to be in the classpath of Spark for hive support to be enabled.
if the hive jars are not present in classpath, the catalog implementation
used is in-memory
In spark-shell we can confirm this by executing
sc.getConf.get("spark.sql.catalogImplementation")
这将给出in-memory
def enableHiveSupport(): Builder = synchronized {
if (hiveClassesArePresent) {
config(CATALOG_IMPLEMENTATION.key, "hive")
} else {
throw new IllegalArgumentException(
"Unable to instantiate SparkSession with Hive support because " +
"Hive classes are not found.")
}
}
private[spark] def hiveClassesArePresent: Boolean = {
try {
Utils.classForName(HIVE_SESSION_STATE_BUILDER_CLASS_NAME)
Utils.classForName("org.apache.hadoop.hive.conf.HiveConf")
true
} catch {
case _: ClassNotFoundException | _: NoClassDefFoundError => false
}
}
If the classes are not present, Hive support is not enabled. Link to the code where the above checks happen as part of Spark shell initialization.
在上面作为问题粘贴的代码中,SPARK_DIST_CLASSPATH
仅填充了Hadoop类路径和缺少Hive jar的路径.
In the above code pasted as part of question, SPARK_DIST_CLASSPATH
is populated only with the Hadoop classpath and the paths to Hive jars missing.
这篇关于自定义spark在纱线上运行时找不到蜂巢数据库的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!