使用Apache Spark中的current_timestamp获取时区的正确偏移量 [英] Getting correct offset for timezone using current_timestamp in apache spark
问题描述
我对Java和Apache Spark都是陌生的,试图了解时间戳和时区使用情况.我希望所有时间戳都从我从apache spark DF获取的数据存储在SQL Server的EST时区中.
I am new to both Java and Apache spark and trying to understand the timestamp and timezone usage. I would like all the timestamps to be stored in EST timezone in SQL Server from data i get from apache spark DF.
当我使用current_timestamp时,我得到的是正确的EST时间,但是当我查看数据时得到的偏移量是"+00:00"而不是"-04:00".
When I use current_timestamp, I am getting the correct EST time but the offset i am getting when i look at data is '+00:00' instead of '-04:00'.
这是存储在数据库中的值,该值是从spark数据集传入的: 2020-04-07 11:36:23.0220 +00:00
Here is a value stored in database that is passed in from spark dataset:
2020-04-07 11:36:23.0220 +00:00
从我看到的内容来看,current_timestamp不接受任何时区.而且,时间是正确的(在美国东部时间),但是我不明白为什么偏移量是错误的.
From what I see current_timestamp does not accept any timezone. Moreover, the time is correct (it is in EST) but i don't understand why the offset is wrong.
任何了解这一点的帮助都将非常有用.
Any help to understand this would be great.
推荐答案
Java Timestamp
的工作方式与Java中的 LocalDateTime
差不多-它们不包含时区信息.数据库将其解释为UTC时间戳.这就是为什么您不匹配的原因.我通常使用两种方法(取决于哪种方法更合适)
Java Timestamp
s work more or less as LocalDateTime
in Java - they don't contain timezone information. And the database is interpreting this as UTC timestamp. That's why you got a mismatch. I usually use two approaches (depending what suits better)
- 您可以从Spark(使用简单的自定义UDF)返回UTC时间戳,而不用使用时区特定的
current_timestamp
. - 您可以将日期编码为字符串-类似地,使用
java.time
API,您可以通过简单的udf 来实现
- You can return UTC timestamp from Spark (with simple custom UDF) instead of using
current_timestamp
which is timezone specific. - You can encode your dates as Strings - similarly, using
java.time
API you can achieve that with simple udf
希望现在情况变得更清楚了.
Hope things are a bit clearer now.
这篇关于使用Apache Spark中的current_timestamp获取时区的正确偏移量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!