从AWS RDS(MySQL)加载表时是否可以强制模式定义 [英] Is it possible to force schema definition when loading tables from AWS RDS (MySQL)
问题描述
我正在使用Apache Spark从 AWS RDS 的 MySQL 数据库读取数据.
I'm using Apache Spark to read data from MySQL database from AWS RDS.
它实际上也在从数据库中推断模式.不幸的是,表的一列是 TINYINT(1)
类型的(列名:active). active
列具有以下值:
It is actually inferring the schema from the database as well. Unfortunately, one of the table's columns is of type TINYINT(1)
(column name : active). The active
column has the following values:
- 无效
- 活动
- 待处理
- 等
Spark将 TINYINT(1)
识别为 BooleanType .因此,他将 active
中的所有值更改为 true
或 false
.结果,我无法确定该值.
Spark recognizes TINYINT(1)
as BooleanType. So he change all value in active
to true
or false
. As a result, I can’t identify the value.
在加载表格时是否可以强制执行模式定义?
推荐答案
将 TINYINT 类型转换为不是火花> boolean
,但在引擎盖下使用了j连接器.
It's not spark that converts the TINYINT
type into a boolean
but the j-connector used under the hood.
因此,实际上您不需要为该问题指定架构.因为真正的原因是jdbc驱动程序将数据类型 TINYINT(1)
视为 BIT
类型(因为服务器默默地转换了 BIT
->创建表时 TINYINT(1)
.
So, actually you don't need to specify a schema for that issue. Because what's actually causing this is the jdbc driver that treats the datatype TINYINT(1)
as the BIT
type (because the server silently converts BIT
-> TINYINT(1)
when creating tables).
You can check all the tips and gotchas of the jdbc connector in the MySQL official Connector/J Configuration Properties guide.
您只需要为URL连接添加以下内容即可为jdbc连接器传递正确的参数:
You just need to pass the right parameters for your jdbc connector by adding the following to your url connection :
val newUrl = s"$oldUrl&tinyInt1isBit=false"
val data = spark.read.format("jdbc")
.option("url", newUrl)
// your other jdbc options
.load
这篇关于从AWS RDS(MySQL)加载表时是否可以强制模式定义的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!