数据框到 Oracle 创建具有区分大小写列的表 [英] Dataframe to Oracle creates table with case sensitive column

查看:29
本文介绍了数据框到 Oracle 创建具有区分大小写列的表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Spark:2.1.1

我将 dataframe 保存为 Oracle 表,但生成的 Oracle 表具有区分大小写"列.

val properties = new java.util.Propertiesproperties.setProperty("用户", ora_username)properties.setProperty("密码", ora_pwd)properties.setProperty("batchsize", "30000")properties.setProperty("driver", db_driver)

<块引用>

spark.sql("select * from myTable").repartition(50).write.mode(SaveMode.Overwrite).j​​dbc(url,"myTable_oracle", properties)

当我在 Oracle 中看到时,

  1. Select * from myTable_oracle; => 有效
  2. 从 myTable_oracle 中选择 col1; => 不起作用
  3. Select "col1" from myTable_oracle; => 有效,但很烦人

尝试了下面的设置,但还是一样的问题:

spark.sqlContext.sql("set spark.sql.caseSensitive=false")

过去在 Spark 1.6.1 中使用的代码相同,它创建具有不区分大小写列的 Oracle 表.但是使用 Spark 2.1.1 我正面临这个问题.

解决方案

我找到了问题和解决方案:Spark 2.x 开始,每个 columnName 在创建表时都会被双引号引用,因此当您尝试通过 sqlPlus 查询时,生成的 Oracle 表的 columnName 会区分大小写.

方言.quoteIdentifier
[https://github.com/apache/spark/blob/branch-2.1/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala#L645]

而这个 dialect.quoteIdentifier 是双引号 ["]

 def quoteIdentifier(colName: String): String = {s""""$colName""""}

[https://github.com/apache/spark/blob/branch-2.1/sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala#L90]

<块引用>

解决方案:注销现有的 OracleDialect 并重新注册,同时覆盖 dialect.quoteIdentifier 以及使用 Oracle Dialect 所需的其他必要内容

import java.sql.Types导入 org.apache.spark.sql.types._导入 org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils导入 org.apache.spark.sql.jdbc.{ JdbcDialects, JdbcType, JdbcDialect }val url="jdbc:oracle:thin:@HOST:1567/SID"val 方言 = JdbcDialectsJdbcDialects.unregisterDialect(dialect.get(url))val OracleDialect = 新的 JdbcDialect {覆盖 def canHandle(url: String): Boolean = url.startsWith("jdbc:oracle") ||url.contains("oracle")覆盖 def getCatalystType(sqlType: Int, typeName: String, size: Int, md: MetadataBuilder): Option[DataType] = {//以特殊方式处理没有精度/比例的 NUMBER 字段,因为 JDBC ResultSetMetaData 将其转换为 0 procision 和 -127 比例if (sqlType == Types.NUMERIC && size == 0) {//这是次优的,因为我们必须提前选择精度/小数位数,而 Oracle 中的数据是允许的//为每个值设置不同的精度/比例.这种转换目前在我们的域中有效,尽管我们//需要一个更持久的解决方案.查看更改 JDBCRDD(第 406 行)://FROM: mutableRow.update(i, Decimal(decimalVal, p, s))//TO: mutableRow.update(i, Decimal(decimalVal))一些(DecimalType(DecimalType.MAX_PRECISION, 10))}//使用时区处理时间戳(现在我们只是将其转换为具有默认格式的字符串)//否则如果(sqlType == -101){//一些(字符串类型)//}其他 无}覆盖 def getJDBCType(dt: DataType): Option[JdbcType] = dt match {case StringType =>Some(JdbcType("VARCHAR2(2000)", java.sql.Types.VARCHAR))case BooleanType =>一些(JdbcType(NUMBER(1)",java.sql.Types.NUMERIC))case IntegerType =>一些(JdbcType(NUMBER(10)",java.sql.Types.NUMERIC))案例 LongType =>一些(JdbcType(NUMBER(19)",java.sql.Types.NUMERIC))案例 DoubleType =>Some(JdbcType("NUMBER(19,4)", java.sql.Types.NUMERIC))case FloatType =>Some(JdbcType("NUMBER(19,4)", java.sql.Types.NUMERIC))案例 ShortType =>一些(JdbcType(NUMBER(5)",java.sql.Types.NUMERIC))case ByteType =>一些(JdbcType(NUMBER(3)",java.sql.Types.NUMERIC))case BinaryType =>一些(JdbcType(BLOB",java.sql.Types.BLOB))case TimestampType =>一些(JdbcType(日期",java.sql.Types.TIMESTAMP))案例日期类型 =>一些(JdbcType(日期",java.sql.Types.DATE))//case DecimalType.Fixed(precision, scale) =>Some(JdbcType("NUMBER(" + precision + "," + scale + ")", java.sql.Types.NUMERIC))//case DecimalType.Unlimited =>Some(JdbcType("NUMBER(38,4)", java.sql.Types.NUMERIC))案例_ =>没有任何}//Imp from Spark2.0 因为否则 oracle 表列将区分大小写覆盖 def quoteIdentifier(colName: String): String = {列名}}JdbcDialects.registerDialect(OracleDialect)

Spark: 2.1.1

I am saving my dataframe as an Oracle table but the resultant Oracle table has "case sensitive" columns.

val properties = new java.util.Properties
    properties.setProperty("user", ora_username)
    properties.setProperty("password", ora_pwd)
    properties.setProperty("batchsize", "30000")
    properties.setProperty("driver", db_driver)

spark.sql("select * from myTable").repartition(50).write.mode(SaveMode.Overwrite).jdbc(url,"myTable_oracle", properties)

When I see in Oracle ,

  1. Select * from myTable_oracle; => works
  2. Select col1 from myTable_oracle; => Doesn't work
  3. Select "col1" from myTable_oracle; => works , but is very annoying

Tried setting below, but still same issue:

spark.sqlContext.sql("set spark.sql.caseSensitive=false")

Same code used to work in Spark 1.6.1 which creates Oracle table with case-insensitive columns. But with Spark 2.1.1 I am facing this issue.

解决方案

I found the issue and solution : Starting Spark 2.x every columnName gets double quoted while creating table and hence the resultant Oracle table's columnNames become case-sensitive when you try to query them via sqlPlus.

dialect.quoteIdentifier
[https://github.com/apache/spark/blob/branch-2.1/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala#L645]

and this dialect.quoteIdentifier is Double quotes ["]

  def quoteIdentifier(colName: String): String = {
    s""""$colName""""
  }

[https://github.com/apache/spark/blob/branch-2.1/sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala#L90]

Solution : De-register existing OracleDialect and Re-register while overriding dialect.quoteIdentifier along with other necessary stuff needed to work with Oracle Dialect

import java.sql.Types
import org.apache.spark.sql.types._
import org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils
import org.apache.spark.sql.jdbc.{ JdbcDialects, JdbcType, JdbcDialect }


val url= "jdbc:oracle:thin:@HOST:1567/SID"

val dialect = JdbcDialects
JdbcDialects.unregisterDialect(dialect.get(url))

val OracleDialect = new JdbcDialect {
  override def canHandle(url: String): Boolean = url.startsWith("jdbc:oracle") || url.contains("oracle")

  override def getCatalystType(sqlType: Int, typeName: String, size: Int, md: MetadataBuilder): Option[DataType] = {
    // Handle NUMBER fields that have no precision/scale in special way because JDBC ResultSetMetaData converts this to 0 procision and -127 scale
    if (sqlType == Types.NUMERIC && size == 0) {
      // This is sub-optimal as we have to pick a precision/scale in advance whereas the data in Oracle is allowed 
      //  to have different precision/scale for each value.  This conversion works in our domain for now though we 
      //  need a more durable solution.  Look into changing JDBCRDD (line 406):
      //    FROM:  mutableRow.update(i, Decimal(decimalVal, p, s))
      //    TO:  mutableRow.update(i, Decimal(decimalVal))
      Some(DecimalType(DecimalType.MAX_PRECISION, 10))
    } // Handle Timestamp with timezone (for now we are just converting this to a string with default format)
    //else if (sqlType == -101) {
    // Some(StringType)
    // } 
    else None
  }

  override def getJDBCType(dt: DataType): Option[JdbcType] = dt match {
    case StringType            => Some(JdbcType("VARCHAR2(2000)", java.sql.Types.VARCHAR))
    case BooleanType           => Some(JdbcType("NUMBER(1)", java.sql.Types.NUMERIC))
    case IntegerType           => Some(JdbcType("NUMBER(10)", java.sql.Types.NUMERIC))
    case LongType              => Some(JdbcType("NUMBER(19)", java.sql.Types.NUMERIC))
    case DoubleType            => Some(JdbcType("NUMBER(19,4)", java.sql.Types.NUMERIC))
    case FloatType             => Some(JdbcType("NUMBER(19,4)", java.sql.Types.NUMERIC))
    case ShortType             => Some(JdbcType("NUMBER(5)", java.sql.Types.NUMERIC))
    case ByteType              => Some(JdbcType("NUMBER(3)", java.sql.Types.NUMERIC))
    case BinaryType            => Some(JdbcType("BLOB", java.sql.Types.BLOB))
    case TimestampType         => Some(JdbcType("DATE", java.sql.Types.TIMESTAMP))
    case DateType              => Some(JdbcType("DATE", java.sql.Types.DATE))
    //case DecimalType.Fixed(precision, scale) => Some(JdbcType("NUMBER(" + precision + "," + scale + ")", java.sql.Types.NUMERIC))
    //case DecimalType.Unlimited => Some(JdbcType("NUMBER(38,4)", java.sql.Types.NUMERIC))
    case _                     => None
  }

  //Imp from Spark2.0 since otherwise oracle table columns would be case-sensitive
  override def quoteIdentifier(colName: String): String = {
    colName
  }

}

JdbcDialects.registerDialect(OracleDialect)

这篇关于数据框到 Oracle 创建具有区分大小写列的表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆