如何在 spark 2.4.1 中将 jdbc/partitionColumn 类型设置为 Date [英] How to set jdbc/partitionColumn type to Date in spark 2.4.1

查看:27
本文介绍了如何在 spark 2.4.1 中将 jdbc/partitionColumn 类型设置为 Date的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 spark-sql-2.4.1 版本从 oracle 检索数据.我尝试将 JdbcOptions 设置如下:

I am trying to retrieve data from oracle using spark-sql-2.4.1 version. I tried to set the JdbcOptions as below :

    .option("lowerBound", "31-MAR-02");
    .option("upperBound", "01-MAY-19");
    .option("partitionColumn", "data_date");
    .option("numPartitions", 240);

但给出错误:

    java.lang.IllegalArgumentException: Timestamp format must be yyyy-mm-dd hh:mm:ss[.fffffffff]
        at java.sql.Timestamp.valueOf(Timestamp.java:204)
        at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.toInternalBoundValue(JDBCRelation.scala:179)

然后尝试如下

    .option("lowerBound", "2002-03-31"); //changed the date format
    .option("upperBound", "2019-05-02");
    .option("partitionColumn", "data_date"); 
    .option("numPartitions", 240);

仍然没有运气.那么将日期作为下限/上限"传递的正确方法是什么?有没有办法指定/设置选项参数数据类型?

Still no luck. So what is the correct way to pass the date as "lower/upperBound"? Is there a way to specify/set option parameter data Type ?

第 2 部分正确检查了选项.它们在执行查询之前被覆盖.所以更正了....现在该错误已解决.

Part-2 Checked the options properly. they were overwritten in between before executing the query. So corrected it. ... now that error resolved.

但对于以下选项:

.option("lowerBound", "2002-03-31 00:00:00"); 
.option("upperBound", "2019-05-01 23:59:59");
.option("timestampFormat", "yyyy-mm-dd hh:mm:ss");

查询字符串:

query ->  ( SELECT * FROM MODEL_VALS ) T

它引发了另一个错误:

java.sql.SQLException: ORA-12801: error signaled in parallel query server P022, instance nj0005

ORA-01861: literal does not match format string

推荐答案

如果您使用的是 Oracle,请参阅 https://github.com/apache/spark/blob/master/external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala#L441

If you are using Oracle, see https://github.com/apache/spark/blob/master/external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala#L441

val df1 = spark.read.format("jdbc")
      .option("url", jdbcUrl)
      .option("dbtable", "datetimePartitionTest")
      .option("partitionColumn", "d")
      .option("lowerBound", "2018-07-06")
      .option("upperBound", "2018-07-20")
      .option("numPartitions", 3)
      // oracle.jdbc.mapDateToTimestamp defaults to true. If this flag is not disabled, column d
      // (Oracle DATE) will be resolved as Catalyst Timestamp, which will fail bound evaluation of
      // the partition column. E.g. 2018-07-06 cannot be evaluated as Timestamp, and the error
      // message says: Timestamp format must be yyyy-mm-dd hh:mm:ss[.fffffffff].
      .option("oracle.jdbc.mapDateToTimestamp", "false")
      .option("sessionInitStatement", "ALTER SESSION SET NLS_DATE_FORMAT = 'YYYY-MM-DD'")
      .load()

这篇关于如何在 spark 2.4.1 中将 jdbc/partitionColumn 类型设置为 Date的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆