到Oracle的数据框使用区分大小写的列创建表 [英] Dataframe to Oracle creates table with case sensitive column

查看:75
本文介绍了到Oracle的数据框使用区分大小写的列创建表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

火花:2.1.1

我将dataframe保存为Oracle表,但是生成的Oracle表具有"区分大小写"列.

I am saving my dataframe as an Oracle table but the resultant Oracle table has "case sensitive" columns.

val properties = new java.util.Properties
    properties.setProperty("user", ora_username)
    properties.setProperty("password", ora_pwd)
    properties.setProperty("batchsize", "30000")
    properties.setProperty("driver", db_driver)

spark.sql("select * from myTable").repartition(50).write.mode(SaveMode.Overwrite).jdbc(url,"myTable_oracle", properties)

当我在Oracle中看到

  1. Select * from myTable_oracle; =>有效
  2. Select col1 from myTable_oracle; => 不起作用
  3. Select "col1" from myTable_oracle; =>可以,但是很烦人
  1. Select * from myTable_oracle; => works
  2. Select col1 from myTable_oracle; => Doesn't work
  3. Select "col1" from myTable_oracle; => works , but is very annoying

尝试了以下设置,但仍然存在相同的问题:

spark.sqlContext.sql("set spark.sql.caseSensitive=false")

使用相同的代码在Spark 1.6.1中工作,该代码创建不区分大小写的列的Oracle表.但是使用Spark 2.1.1我正面临这个问题.

Same code used to work in Spark 1.6.1 which creates Oracle table with case-insensitive columns. But with Spark 2.1.1 I am facing this issue.

推荐答案

我发现了问题和解决方案: Spark 2.x 开始,在创建表时,每个columnName都用双引号引起来,因此,当您尝试通过sqlPlus查询时,所得的Oracle表的columnNames区分大小写.

I found the issue and solution : Starting Spark 2.x every columnName gets double quoted while creating table and hence the resultant Oracle table's columnNames become case-sensitive when you try to query them via sqlPlus.

dialect.quoteIdentifier
[,并且此 dialect.quoteIdentifier 是双引号["]

and this dialect.quoteIdentifier is Double quotes ["]

  def quoteIdentifier(colName: String): String = {
    s""""$colName""""
  }

[

解决方案:取消注册现有的OracleDialect,然后重新注册 覆盖 dialect.quoteIdentifier 以及使用Oracle Dialect所需的其他必要内容

Solution : De-register existing OracleDialect and Re-register while overriding dialect.quoteIdentifier along with other necessary stuff needed to work with Oracle Dialect

import java.sql.Types
import org.apache.spark.sql.types._
import org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils
import org.apache.spark.sql.jdbc.{ JdbcDialects, JdbcType, JdbcDialect }


val url= "jdbc:oracle:thin:@HOST:1567/SID"

val dialect = JdbcDialects
JdbcDialects.unregisterDialect(dialect.get(url))

val OracleDialect = new JdbcDialect {
  override def canHandle(url: String): Boolean = url.startsWith("jdbc:oracle") || url.contains("oracle")

  override def getCatalystType(sqlType: Int, typeName: String, size: Int, md: MetadataBuilder): Option[DataType] = {
    // Handle NUMBER fields that have no precision/scale in special way because JDBC ResultSetMetaData converts this to 0 procision and -127 scale
    if (sqlType == Types.NUMERIC && size == 0) {
      // This is sub-optimal as we have to pick a precision/scale in advance whereas the data in Oracle is allowed 
      //  to have different precision/scale for each value.  This conversion works in our domain for now though we 
      //  need a more durable solution.  Look into changing JDBCRDD (line 406):
      //    FROM:  mutableRow.update(i, Decimal(decimalVal, p, s))
      //    TO:  mutableRow.update(i, Decimal(decimalVal))
      Some(DecimalType(DecimalType.MAX_PRECISION, 10))
    } // Handle Timestamp with timezone (for now we are just converting this to a string with default format)
    //else if (sqlType == -101) {
    // Some(StringType)
    // } 
    else None
  }

  override def getJDBCType(dt: DataType): Option[JdbcType] = dt match {
    case StringType            => Some(JdbcType("VARCHAR2(2000)", java.sql.Types.VARCHAR))
    case BooleanType           => Some(JdbcType("NUMBER(1)", java.sql.Types.NUMERIC))
    case IntegerType           => Some(JdbcType("NUMBER(10)", java.sql.Types.NUMERIC))
    case LongType              => Some(JdbcType("NUMBER(19)", java.sql.Types.NUMERIC))
    case DoubleType            => Some(JdbcType("NUMBER(19,4)", java.sql.Types.NUMERIC))
    case FloatType             => Some(JdbcType("NUMBER(19,4)", java.sql.Types.NUMERIC))
    case ShortType             => Some(JdbcType("NUMBER(5)", java.sql.Types.NUMERIC))
    case ByteType              => Some(JdbcType("NUMBER(3)", java.sql.Types.NUMERIC))
    case BinaryType            => Some(JdbcType("BLOB", java.sql.Types.BLOB))
    case TimestampType         => Some(JdbcType("DATE", java.sql.Types.TIMESTAMP))
    case DateType              => Some(JdbcType("DATE", java.sql.Types.DATE))
    //case DecimalType.Fixed(precision, scale) => Some(JdbcType("NUMBER(" + precision + "," + scale + ")", java.sql.Types.NUMERIC))
    //case DecimalType.Unlimited => Some(JdbcType("NUMBER(38,4)", java.sql.Types.NUMERIC))
    case _                     => None
  }

  //Imp from Spark2.0 since otherwise oracle table columns would be case-sensitive
  override def quoteIdentifier(colName: String): String = {
    colName
  }

}

JdbcDialects.registerDialect(OracleDialect)

这篇关于到Oracle的数据框使用区分大小写的列创建表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆