从JDBC创建spark数据帧时如何指定sql方言? [英] How to specify sql dialect when creating spark dataframe from JDBC?
问题描述
我在使用 Spark 通过自定义 JDBC 读取数据时遇到问题.我将如何覆盖通过 jdbc url 推断的 sql 方言?
I'm having an issue reading data via custom JDBC with Spark. How would I go about about overriding the sql dialect inferred via jdbc url?
有问题的数据库是 vitess (https://github.com/youtube/vitess)它运行一个 mysql 变体,所以我想指定一个 mysql 方言.jdbc url 以 jdbc:vitess/
The database in question is vitess (https://github.com/youtube/vitess) which runs a mysql variant, so I want to specify a mysql dialect. The jdbc url begins with jdbc:vitess/
否则 DataFrameReader 会推断默认方言使用 """ 作为引用标识符.因此,通过 spark.read.jdbc 的查询将作为
Otherwise the DataFrameReader is inferring a default dialect which uses """ as a quote identifier. As a result, queries via spark.read.jdbc get sent as
从表中选择id"、col2"、col3"、etc"
Select 'id', 'col2', col3', 'etc' from table
选择字符串表示而不是列值而不是
which selects the string representations instead of the column values instead of
从表中选择 id、col2、col3 等
Select id, col2, col3, etc from table
推荐答案
也许为时已晚.但答案将在下一个:
Maybe it's too late. But answer will be next:
创建自定义方言,就像我为 ClickHouse 数据库所做的一样(我的 jdbc 连接 url 看起来像这样 jdbc:clickhouse://localhost:8123)
Create your custom dialect, as I did for ClickHouse database(my jdbc connection url looks like this jdbc:clickhouse://localhost:8123)
private object ClickHouseDialect extends JdbcDialect {
//override here quoting logic as you wish
override def quoteIdentifier(colName: String): String = colName
override def canHandle(url: String): Boolean = url.startsWith("jdbc:clickhouse")
}
并在代码中的某处注册它,如下所示:
And register it somewhere in your code, like this:
JdbcDialects.registerDialect(ClickHouseDialect)
这篇关于从JDBC创建spark数据帧时如何指定sql方言?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!