spark sql不能正确地转换时区 [英] spark sql not converting timezone correctly

查看：1446 发布时间：2018/6/12 14:03:12 scala apache-spark hive timezone

本文介绍了spark sql不能正确地转换时区的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

使用scala 2.10.4和spark 1.5.1和spark 1.6

  sqlContext.sql（
 （from_unixtime（at），'US / Pacific'）），
 | from_utc_timestamp（from_unixtime（at），'US / Pacific'），
 | select id，
 | to_date（from_utc_timestamp 
 | from_unixtime（at），
 | to_date（from_unixtime（at）），
 | at 
 | from events 
 | limit 100 
 .stripMargin）.collect（）。foreach（println）

Spark-Submit选项：
- driver-java-options'-Duser.timezone = US / Pacific'

结果：

  [56d2a9573bc4b5c38453eae7,2016-02-28,2016-02-27 16：01：27.0,2016-02-28 08:01： 27,2016-02-28,1456646487] 
 [56d2aa1bfd2460183a571762,2016-02-28,2016-02-27 16：04：43.0,2016-02-28 08：04：43,2016-02-28 ，1456646683] 
 [56d2aaa9eb63bbb6335d5b5,2016-02-28,2016-02-27 16：07：05.0,2016-02-28 08：07：05,2016-02-28,1456646825] 
 [56d2aab15a21fa5f4c4f42a7,2016-02-28,2016-02-27 16：07：13.0,2016-02-28 08：07：13,2016-02-28,1456646833] 
 [56d2aac8aeeee48b74531af0,2016-02-28,2016-02-27 16：07：36.0,2016 -02-28 08：07：36,2016-02-28,1456646856] 
 [56d2ab1d87fd3f4f72567788,2016-02-28,2016-02-27 16：09：01.0,2016-02-28 08:09 ：01,2016-02-28,1456646941]

美国/太平洋地区的时间应该是 2016-02-28 00:01:27 等等，但是一些它如何减少8小时两次

解决方案 Spark-Sql不支持日期时间，并且也不是时区

使用时间戳是唯一的解决方案

from_unixtime（at）时间是正确的，只是将它打印为字符串会因时区而改变它。假设 from_unixtime 会正确转换它（尽管打印它可能显示不同的结果）是安全的。

from_utc_timestamp 会将时间戳移动（不仅仅是转换）到该时区，在这种情况下，它会从（-08：00）开始减去8个小时

打印SQL结果会扰乱与时区相关的时间参数

using scala 2.10.4 and spark 1.5.1 and spark 1.6
sqlContext.sql( """ |select id, |to_date(from_utc_timestamp(from_unixtime(at), 'US/Pacific')), |from_utc_timestamp(from_unixtime(at), 'US/Pacific'), |from_unixtime(at), |to_date(from_unixtime(at)), | at |from events | limit 100 """.stripMargin).collect().foreach(println)
Spark-Submit options: --driver-java-options '-Duser.timezone=US/Pacific'

result:
[56d2a9573bc4b5c38453eae7,2016-02-28,2016-02-27 16:01:27.0,2016-02-28 08:01:27,2016-02-28,1456646487] [56d2aa1bfd2460183a571762,2016-02-28,2016-02-27 16:04:43.0,2016-02-28 08:04:43,2016-02-28,1456646683] [56d2aaa9eb63bbb63456d5b5,2016-02-28,2016-02-27 16:07:05.0,2016-02-28 08:07:05,2016-02-28,1456646825] [56d2aab15a21fa5f4c4f42a7,2016-02-28,2016-02-27 16:07:13.0,2016-02-28 08:07:13,2016-02-28,1456646833] [56d2aac8aeeee48b74531af0,2016-02-28,2016-02-27 16:07:36.0,2016-02-28 08:07:36,2016-02-28,1456646856] [56d2ab1d87fd3f4f72567788,2016-02-28,2016-02-27 16:09:01.0,2016-02-28 08:09:01,2016-02-28,1456646941]
The time in US/Pacific should be 2016-02-28 00:01:27 etc but some how it subtracts "8" hours twice
解决方案
after reading for sometime following are the conclusions:

Spark-Sql doesn't support date-time, and nor timezones

Using timestamp is the only solution

from_unixtime(at) parses the epoch time correctly, just that the printing of it as a string changes it due to timezone. It is safe to assume that the from_unixtime will convert it correctly ( although printing it might show different results)

from_utc_timestamp will shift ( not just convert) the timestamp to that timezone, in this case it will subtract 8 hours to the time since (-08:00)

printing sql results messes up the times with respect to timezone param

这篇关于spark sql不能正确地转换时区的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

spark sql不能正确地转换时区 [英] spark sql not converting timezone correctly

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

spark sql不能正确地转换时区 [英] spark sql not converting timezone correctly

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭