为什么unix_timestamp会在12小时之内不正确地解析它? [英] Why is unix_timestamp parsing this incorrectly by 12 hours off?
问题描述
以下内容似乎不正确( spark.sql
):
The following appears to be incorrect (spark.sql
):
select unix_timestamp("2017-07-03T12:03:56", "yyyy-MM-dd'T'hh:mm:ss")
-- 1499040236
相比:
select unix_timestamp("2017-07-03T00:18:31", "yyyy-MM-dd'T'hh:mm:ss")
-- 1499041111
很明显,第一个出现在第二个之后.第二个似乎是正确的:
Clearly the first comes after the second. And the second appears to be correct:
# ** R Code **
# establish constants
one_day = 60 * 60 * 24
one_year = 365 * one_day
one_year_leap = 366 * one_day
one_quad = 3 * one_year + one_year_leap
# to 2014-01-01
11 * one_quad +
# to 2017-01-01
2 * one_year + one_year_leap +
# to 2017-07-01
(31 + 28 + 31 + 30 + 31 + 30) * one_day +
# to 2017-07-03 00:18:31
2 * one_day + 18 * 60 + 31
# [1] 1499041111
类似的计算显示第一个应该是 1499083436
(由 as.integer(as.POSIXct('2017-07-03 12:03:56',tz ='UTC'))
,并且 1499040236
应该与 2017-07-03 00:03:56
相对应.
A similar calculation shows the first should be 1499083436
(confirmed by as.integer(as.POSIXct('2017-07-03 12:03:56', tz = 'UTC'))
in R), and that 1499040236
should correspond to 2017-07-03 00:03:56
.
那么这是怎么回事?它肯定看起来像个错误.最后两次健全检查- select unix_timestamp("2017-07-03T00:03:56","yyyy-MM-dd'T'hh:mm:ss")
正确返回 1499040236
;并将中间的 T
替换为空格
不会对不正确的解析产生影响.
So what's happening here? It certainly looks like a bug. Two last sanity checks -- select unix_timestamp("2017-07-03T00:03:56", "yyyy-MM-dd'T'hh:mm:ss")
correctly returns 1499040236
; and replacing the T
in the middle with a space has no effect on the incorrect parse.
由于它似乎已在开发中修复,因此我会注意到它在 2.1.1
上.
Since it appears to be fixed in development, I'll note that this is on 2.1.1
.
推荐答案
这只是格式错误:
- 您的数据采用0-23小时格式(以
SimpleDateFormat
作为HH
). - 您使用的
hh
格式对应于1-24小时格式.
- Your data is in 0-23 hour format (denoted in
SimpleDateFormat
asHH
). - You use
hh
format which corresponds to 1-24 hour format.
实际上,在最新的Spark版本(2.3.0 RC1)中,它根本无法解析:
In fact, in the latest Spark version (2.3.0 RC1) it wouldn't parse at all:
spark.version
String = 2.3.0
spark.sql("""
select unix_timestamp("2017-07-03T00:18:31", "yyyy-MM-dd'T'hh:mm:ss")""").show
+----------------------------------------------------------+
|unix_timestamp(2017-07-03T00:18:31, yyyy-MM-dd'T'hh:mm:ss)|
+----------------------------------------------------------+
| null|
+----------------------------------------------------------+
这篇关于为什么unix_timestamp会在12小时之内不正确地解析它?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!