从AWS RDS(MySQL)加载表时是否可以强制模式定义 [英] Is it possible to force schema definition when loading tables from AWS RDS (MySQL)

查看:74
本文介绍了从AWS RDS(MySQL)加载表时是否可以强制模式定义的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Apache Spark从 AWS RDS MySQL 数据库读取数据.

I'm using Apache Spark to read data from MySQL database from AWS RDS.

它实际上也在从数据库中推断模式.不幸的是,表的一列是 TINYINT(1)类型的(列名:active). active 列具有以下值:

It is actually inferring the schema from the database as well. Unfortunately, one of the table's columns is of type TINYINT(1) (column name : active). The active column has the following values:

  • 无效
  • 活动
  • 待处理

Spark将 TINYINT(1)识别为 BooleanType .因此,他将 active 中的所有值更改为 true false .结果,我无法确定该值.

Spark recognizes TINYINT(1) as BooleanType. So he change all value in active to true or false. As a result, I can’t identify the value.

在加载表格时是否可以强制执行模式定义?

推荐答案

将 TINYINT 类型转换为不是火花> boolean ,但在引擎盖下使用了j连接器.

It's not spark that converts the TINYINT type into a boolean but the j-connector used under the hood.

因此,实际上您不需要为该问题指定架构.因为真正的原因是jdbc驱动程序将数据类型 TINYINT(1)视为 BIT 类型(因为服务器默默地转换了 BIT ->创建表时 TINYINT(1).

So, actually you don't need to specify a schema for that issue. Because what's actually causing this is the jdbc driver that treats the datatype TINYINT(1) as the BIT type (because the server silently converts BIT -> TINYINT(1) when creating tables).

您可以在MySQL

You can check all the tips and gotchas of the jdbc connector in the MySQL official Connector/J Configuration Properties guide.

您只需要为URL连接添加以下内容即可为jdbc连接器传递正确的参数:

You just need to pass the right parameters for your jdbc connector by adding the following to your url connection :

val newUrl = s"$oldUrl&tinyInt1isBit=false"

val data = spark.read.format("jdbc")
  .option("url", newUrl)
  // your other jdbc options
  .load

这篇关于从AWS RDS(MySQL)加载表时是否可以强制模式定义的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆