带AWS Glue的Spark Catalog:找不到数据库 [英] Spark Catalog w/ AWS Glue: database not found
问题描述
我已经使用胶水数据"目录创建了一个EMR集群.调用spark-shell时,我可以通过
Ive created an EMR cluster with the Glue Data catalog. When I invoke the spark-shell, I am able to successfully list tables stored within a Glue database via
spark.catalog.setCurrentDatabase("test")
spark.catalog.listTables
但是,当我通过spark-submit
提交工作时,出现致命错误
However when I submit a job via spark-submit
I get a fatal error
ERROR ApplicationMaster: User class threw exception: org.apache.spark.sql.AnalysisException: Database 'test' does not exist.;
我正在通过spark-submit
通过
SparkSession.builder.enableHiveSupport.getOrCreate
推荐答案
在代码中添加hive.metastore.client.factory.class
配置以启动Spark会话为我解决了这个问题:
Adding the hive.metastore.client.factory.class
configuration to the code initiating the spark session solved the issue for me:
SparkSession spark = SparkSession.builder()
...
.config("hive.metastore.client.factory.class", "com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory")
.enableHiveSupport()
.getOrCreate();
这与AWS文档中定义的配置相同( https ://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-glue.html ),并在创建集群时检查Use for Hive table metadata
时将其添加到集群配置中,但出于某些原因,请执行以下操作:不能按预期工作(我使用的是Emr 5.12.0).
that's the same configuration defined in aws docs (https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-glue.html) and added to the cluster configuration when checking Use for Hive table metadata
on cluster creation, but for some reason dosn't work as expected (I'm using emr 5.12.0).
这篇关于带AWS Glue的Spark Catalog:找不到数据库的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!