带AWS Glue的Spark Catalog:找不到数据库 [英] Spark Catalog w/ AWS Glue: database not found

查看:235
本文介绍了带AWS Glue的Spark Catalog:找不到数据库的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经使用胶水数据"目录创建了一个EMR集群.调用spark-shell时,我可以通过

Ive created an EMR cluster with the Glue Data catalog. When I invoke the spark-shell, I am able to successfully list tables stored within a Glue database via

spark.catalog.setCurrentDatabase("test")
spark.catalog.listTables

但是,当我通过spark-submit提交工作时,出现致命错误

However when I submit a job via spark-submit I get a fatal error

ERROR ApplicationMaster: User class threw exception: org.apache.spark.sql.AnalysisException: Database 'test' does not exist.;

我正在通过spark-submit通过

SparkSession.builder.enableHiveSupport.getOrCreate

推荐答案

在代码中添加hive.metastore.client.factory.class配置以启动Spark会话为我解决了这个问题:

Adding the hive.metastore.client.factory.class configuration to the code initiating the spark session solved the issue for me:

SparkSession spark = SparkSession.builder()
...
            .config("hive.metastore.client.factory.class", "com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory")
            .enableHiveSupport()
            .getOrCreate();

这与AWS文档中定义的配置相同( https ://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-glue.html ),并在创建集群时检查Use for Hive table metadata时将其添加到集群配置中,但出于某些原因,请执行以下操作:不能按预期工作(我使用的是Emr 5.12.0).

that's the same configuration defined in aws docs (https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-glue.html) and added to the cluster configuration when checking Use for Hive table metadata on cluster creation, but for some reason dosn't work as expected (I'm using emr 5.12.0).

这篇关于带AWS Glue的Spark Catalog:找不到数据库的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆