火花雅典娜连接器 [英] spark Athena connector
问题描述
我需要在 spark 中使用 Athena 但 spark 在使用 JDBC 驱动程序时使用了 PreparedStatement 并且它给了我一个异常com.amazonaws.athena.jdbc.NotImplementedException:方法 Connection.prepareStatement 尚未实现"
I need to use Athena in spark but spark uses preparedStatement when using JDBC drivers and it gives me an exception "com.amazonaws.athena.jdbc.NotImplementedException: Method Connection.prepareStatement is not yet implemented"
请告诉我如何在 spark 中连接 Athena
Can you please let me know how can I connect Athena in spark
推荐答案
我不知道您如何从 Spark 连接到 Athena,但您不需要 - 您可以非常轻松地查询 Athena 包含的数据(或更准确地说,注册")来自 Spark.
I don't know how you'd connect to Athena from Spark, but you don't need to - you can very easily query the data that Athena contains (or, more correctly, "registers") from Spark.
雅典娜分为两部分
- Hive Metastore(现在称为 Glue 数据目录),其中包含数据库和表名以及所有底层文件之间的映射
- Presto 查询引擎可将您的 SQL 转换为针对这些文件的数据操作
当您启动 EMR 集群(v5.8.0 及更高版本)时,您可以指示它连接到您的 Glue 数据目录.这是创建集群"对话框中的复选框.当您选中此选项时,您的 Spark SqlContext
将连接到 Glue 数据目录,您将能够看到 Athena 中的表.
When you start an EMR cluster (v5.8.0 and later) you can instruct it to connect to your Glue Data Catalog. This is a checkbox in the 'create cluster' dialog. When you check this option your Spark SqlContext
will connect to the Glue Data Catalog, and you'll be able to see the tables in Athena.
然后您可以照常查询这些表.
You can then query these tables as normal.
请参阅 https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-glue.html 了解更多信息.
See https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-glue.html for more.
这篇关于火花雅典娜连接器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!