数据框到 Oracle 创建具有区分大小写列的表 [英] Dataframe to Oracle creates table with case sensitive column
问题描述
Spark:2.1.1
我将 dataframe
保存为 Oracle
表,但生成的 Oracle 表具有区分大小写"列.
val properties = new java.util.Propertiesproperties.setProperty("用户", ora_username)properties.setProperty("密码", ora_pwd)properties.setProperty("batchsize", "30000")properties.setProperty("driver", db_driver)
<块引用>
spark.sql("select * from myTable").repartition(50).write.mode(SaveMode.Overwrite).jdbc(url,"myTable_oracle", properties)
当我在 Oracle
中看到时,
Select * from myTable_oracle;
=> 有效从 myTable_oracle 中选择 col1;
=> 不起作用Select "col1" from myTable_oracle;
=> 有效,但很烦人
尝试了下面的设置,但还是一样的问题:
spark.sqlContext.sql("set spark.sql.caseSensitive=false")
过去在 Spark 1.6.1
中使用的代码相同,它创建具有不区分大小写列的 Oracle
表.但是使用 Spark 2.1.1
我正面临这个问题.
我找到了问题和解决方案:从 Spark 2.x 开始,每个 columnName 在创建表时都会被双引号引用,因此当您尝试通过 sqlPlus 查询时,生成的 Oracle 表的 columnName 会区分大小写.
方言.quoteIdentifier
[https://github.com/apache/spark/blob/branch-2.1/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala#L645]
而这个 dialect.quoteIdentifier 是双引号 ["]
def quoteIdentifier(colName: String): String = {s""""$colName""""}
<块引用>解决方案:注销现有的 OracleDialect 并重新注册,同时覆盖 dialect.quoteIdentifier 以及使用 Oracle Dialect 所需的其他必要内容
import java.sql.Types导入 org.apache.spark.sql.types._导入 org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils导入 org.apache.spark.sql.jdbc.{ JdbcDialects, JdbcType, JdbcDialect }val url="jdbc:oracle:thin:@HOST:1567/SID"val 方言 = JdbcDialectsJdbcDialects.unregisterDialect(dialect.get(url))val OracleDialect = 新的 JdbcDialect {覆盖 def canHandle(url: String): Boolean = url.startsWith("jdbc:oracle") ||url.contains("oracle")覆盖 def getCatalystType(sqlType: Int, typeName: String, size: Int, md: MetadataBuilder): Option[DataType] = {//以特殊方式处理没有精度/比例的 NUMBER 字段,因为 JDBC ResultSetMetaData 将其转换为 0 procision 和 -127 比例if (sqlType == Types.NUMERIC && size == 0) {//这是次优的,因为我们必须提前选择精度/小数位数,而 Oracle 中的数据是允许的//为每个值设置不同的精度/比例.这种转换目前在我们的域中有效,尽管我们//需要一个更持久的解决方案.查看更改 JDBCRDD(第 406 行)://FROM: mutableRow.update(i, Decimal(decimalVal, p, s))//TO: mutableRow.update(i, Decimal(decimalVal))一些(DecimalType(DecimalType.MAX_PRECISION, 10))}//使用时区处理时间戳(现在我们只是将其转换为具有默认格式的字符串)//否则如果(sqlType == -101){//一些(字符串类型)//}其他 无}覆盖 def getJDBCType(dt: DataType): Option[JdbcType] = dt match {case StringType =>Some(JdbcType("VARCHAR2(2000)", java.sql.Types.VARCHAR))case BooleanType =>一些(JdbcType(NUMBER(1)",java.sql.Types.NUMERIC))case IntegerType =>一些(JdbcType(NUMBER(10)",java.sql.Types.NUMERIC))案例 LongType =>一些(JdbcType(NUMBER(19)",java.sql.Types.NUMERIC))案例 DoubleType =>Some(JdbcType("NUMBER(19,4)", java.sql.Types.NUMERIC))case FloatType =>Some(JdbcType("NUMBER(19,4)", java.sql.Types.NUMERIC))案例 ShortType =>一些(JdbcType(NUMBER(5)",java.sql.Types.NUMERIC))case ByteType =>一些(JdbcType(NUMBER(3)",java.sql.Types.NUMERIC))case BinaryType =>一些(JdbcType(BLOB",java.sql.Types.BLOB))case TimestampType =>一些(JdbcType(日期",java.sql.Types.TIMESTAMP))案例日期类型 =>一些(JdbcType(日期",java.sql.Types.DATE))//case DecimalType.Fixed(precision, scale) =>Some(JdbcType("NUMBER(" + precision + "," + scale + ")", java.sql.Types.NUMERIC))//case DecimalType.Unlimited =>Some(JdbcType("NUMBER(38,4)", java.sql.Types.NUMERIC))案例_ =>没有任何}//Imp from Spark2.0 因为否则 oracle 表列将区分大小写覆盖 def quoteIdentifier(colName: String): String = {列名}}JdbcDialects.registerDialect(OracleDialect)
Spark: 2.1.1
I am saving my dataframe
as an Oracle
table but the resultant Oracle table has "case sensitive" columns.
val properties = new java.util.Properties
properties.setProperty("user", ora_username)
properties.setProperty("password", ora_pwd)
properties.setProperty("batchsize", "30000")
properties.setProperty("driver", db_driver)
spark.sql("select * from myTable").repartition(50).write.mode(SaveMode.Overwrite).jdbc(url,"myTable_oracle", properties)
When I see in Oracle
,
Select * from myTable_oracle;
=> worksSelect col1 from myTable_oracle;
=> Doesn't workSelect "col1" from myTable_oracle;
=> works , but is very annoying
Tried setting below, but still same issue:
spark.sqlContext.sql("set spark.sql.caseSensitive=false")
Same code used to work in Spark 1.6.1
which creates Oracle
table with case-insensitive columns. But with Spark 2.1.1
I am facing this issue.
I found the issue and solution : Starting Spark 2.x every columnName gets double quoted while creating table and hence the resultant Oracle table's columnNames become case-sensitive when you try to query them via sqlPlus.
dialect.quoteIdentifier
[https://github.com/apache/spark/blob/branch-2.1/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala#L645]
and this dialect.quoteIdentifier is Double quotes ["]
def quoteIdentifier(colName: String): String = {
s""""$colName""""
}
Solution : De-register existing OracleDialect and Re-register while overriding dialect.quoteIdentifier along with other necessary stuff needed to work with Oracle Dialect
import java.sql.Types
import org.apache.spark.sql.types._
import org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils
import org.apache.spark.sql.jdbc.{ JdbcDialects, JdbcType, JdbcDialect }
val url= "jdbc:oracle:thin:@HOST:1567/SID"
val dialect = JdbcDialects
JdbcDialects.unregisterDialect(dialect.get(url))
val OracleDialect = new JdbcDialect {
override def canHandle(url: String): Boolean = url.startsWith("jdbc:oracle") || url.contains("oracle")
override def getCatalystType(sqlType: Int, typeName: String, size: Int, md: MetadataBuilder): Option[DataType] = {
// Handle NUMBER fields that have no precision/scale in special way because JDBC ResultSetMetaData converts this to 0 procision and -127 scale
if (sqlType == Types.NUMERIC && size == 0) {
// This is sub-optimal as we have to pick a precision/scale in advance whereas the data in Oracle is allowed
// to have different precision/scale for each value. This conversion works in our domain for now though we
// need a more durable solution. Look into changing JDBCRDD (line 406):
// FROM: mutableRow.update(i, Decimal(decimalVal, p, s))
// TO: mutableRow.update(i, Decimal(decimalVal))
Some(DecimalType(DecimalType.MAX_PRECISION, 10))
} // Handle Timestamp with timezone (for now we are just converting this to a string with default format)
//else if (sqlType == -101) {
// Some(StringType)
// }
else None
}
override def getJDBCType(dt: DataType): Option[JdbcType] = dt match {
case StringType => Some(JdbcType("VARCHAR2(2000)", java.sql.Types.VARCHAR))
case BooleanType => Some(JdbcType("NUMBER(1)", java.sql.Types.NUMERIC))
case IntegerType => Some(JdbcType("NUMBER(10)", java.sql.Types.NUMERIC))
case LongType => Some(JdbcType("NUMBER(19)", java.sql.Types.NUMERIC))
case DoubleType => Some(JdbcType("NUMBER(19,4)", java.sql.Types.NUMERIC))
case FloatType => Some(JdbcType("NUMBER(19,4)", java.sql.Types.NUMERIC))
case ShortType => Some(JdbcType("NUMBER(5)", java.sql.Types.NUMERIC))
case ByteType => Some(JdbcType("NUMBER(3)", java.sql.Types.NUMERIC))
case BinaryType => Some(JdbcType("BLOB", java.sql.Types.BLOB))
case TimestampType => Some(JdbcType("DATE", java.sql.Types.TIMESTAMP))
case DateType => Some(JdbcType("DATE", java.sql.Types.DATE))
//case DecimalType.Fixed(precision, scale) => Some(JdbcType("NUMBER(" + precision + "," + scale + ")", java.sql.Types.NUMERIC))
//case DecimalType.Unlimited => Some(JdbcType("NUMBER(38,4)", java.sql.Types.NUMERIC))
case _ => None
}
//Imp from Spark2.0 since otherwise oracle table columns would be case-sensitive
override def quoteIdentifier(colName: String): String = {
colName
}
}
JdbcDialects.registerDialect(OracleDialect)
这篇关于数据框到 Oracle 创建具有区分大小写列的表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!