通过 JDBC 集成 Spark SQL 和 Apache Drill [英] Integrating Spark SQL and Apache Drill through JDBC

查看:31
本文介绍了通过 JDBC 集成 Spark SQL 和 Apache Drill的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想根据使用 Apache Drill 对 CSV 数据(在 HDFS 上)执行的查询结果创建一个 Spark SQL DataFrame.我成功配置了 Spark SQL 使其通过 JDBC 连接到 Drill:

I would like to create a Spark SQL DataFrame from the results of a query performed over CSV data (on HDFS) with Apache Drill. I successfully configured Spark SQL to make it connect to Drill via JDBC:

Map<String, String> connectionOptions = new HashMap<String, String>();
connectionOptions.put("url", args[0]);
connectionOptions.put("dbtable", args[1]);
connectionOptions.put("driver", "org.apache.drill.jdbc.Driver");

DataFrame logs = sqlc.read().format("jdbc").options(connectionOptions).load();

Spark SQL 执行两个查询:第一个查询获取架构,第二个查询获取实际数据:

Spark SQL performs two queries: the first one to get the schema, and the second one to retrieve the actual data:

SELECT * FROM (SELECT * FROM dfs.output.`my_view`) WHERE 1=0

SELECT "field1","field2","field3" FROM (SELECT * FROM dfs.output.`my_view`)

第一个成功,但在第二个中,Spark 将字段括在双引号内,这是 Drill 不支持的,因此查询失败.

The first one is successful, but in the second one Spark encloses fields within double quotes, which is something that Drill doesn't support, so the query fails.

有人设法让这种集成工作了吗?

Did someone managed to get this integration working?

谢谢!

推荐答案

你可以为此添加 JDBC Dialect 并在使用 jdbc connector 之前注册该方言

you can add JDBC Dialect for this and register the dialect before using jdbc connector

case object DrillDialect extends JdbcDialect {

  def canHandle(url: String): Boolean = url.startsWith("jdbc:drill:")

  override def quoteIdentifier(colName: java.lang.String): java.lang.String = {
    return colName
  }

  def instance = this
}

JdbcDialects.registerDialect(DrillDialect)

这篇关于通过 JDBC 集成 Spark SQL 和 Apache Drill的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆