从JDBC创建spark数据框时如何指定sql方言? [英] How to specify sql dialect when creating spark dataframe from JDBC?

查看:711
本文介绍了从JDBC创建spark数据框时如何指定sql方言?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在通过带有Spark的自定义JDBC读取数据时遇到问题.我该如何覆盖通过jdbc url推断出的sql方言?

I'm having an issue reading data via custom JDBC with Spark. How would I go about about overriding the sql dialect inferred via jdbc url?

有问题的数据库是访问网站( https://github.com/youtube/vitess )它运行一个mysql变体,所以我想指定一个mysql方言. jdbc网址以jdbc:vitess/

The database in question is vitess (https://github.com/youtube/vitess) which runs a mysql variant, so I want to specify a mysql dialect. The jdbc url begins with jdbc:vitess/

否则,DataFrameReader会推断一个默认的方言,该方言使用"作为引号标识符.因此,通过spark.read.jdbc进行的查询将以

Otherwise the DataFrameReader is inferring a default dialect which uses """ as a quote identifier. As a result, queries via spark.read.jdbc get sent as

从表中选择'id','col2',col3','etc'

Select 'id', 'col2', col3', 'etc' from table

选择字符串表示形式而不是列值 代替

which selects the string representations instead of the column values instead of

从表中选择ID,col2,col3等

Select id, col2, col3, etc from table

推荐答案

也许为时已晚.但是答案将是下一个:

Maybe it's too late. But answer will be next:

创建您的自定义方言,就像我为ClickHouse数据库所做的那样(我的jdbc连接网址看起来像这样jdbc:clickhouse://localhost:8123)

Create your custom dialect, as I did for ClickHouse database(my jdbc connection url looks like this jdbc:clickhouse://localhost:8123)

 private object ClickHouseDialect extends JdbcDialect {
    //override here quoting logic as you wish
    override def quoteIdentifier(colName: String): String = colName

    override def canHandle(url: String): Boolean = url.startsWith("jdbc:clickhouse")
  }

并将其注册在代码中的某个位置,如下所示:

And register it somewhere in your code, like this:

JdbcDialects.registerDialect(ClickHouseDialect)

这篇关于从JDBC创建spark数据框时如何指定sql方言?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆