Hive 1.1中的时间戳记问题 [英] TimeStamp issue in hive 1.1

查看:315
本文介绍了Hive 1.1中的时间戳记问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在生产环境(cloudera 5.5)中的配置单元中遇到一个非常奇怪的问题,这在我的本地服务器中基本上是不可复制的(不知道为什么),即对于某些记录,我从临时表中插入时时间戳值错误插入时将字符串 2017-10-21 23转换为主表,并将其转换为时间戳 2017-10-21 23:00:00数据类型。

I am facing a very weird issue in hive in production environment(cloudera 5.5) which is basically not reproducible in my local server(Don't know why) i.e. for some records I am having wrong timestamp value while inserting from temp table to main table as String "2017-10-21 23" is converted into timestamp "2017-10-21 23:00:00" datatype while insertion.

示例:

2017-10-21 23 -> 2017-10-21 22:00:00
2017-10-22 15 -> 2017-10-22 14:00:00

这种情况很少发生。表示增量值约为数据的1%。

It is happening very very infrequent. Means delta value is about 1% of the data.

Flow :::临时表(外部表)中的数据通过使用oozie每小时进行填充。下面的插入语句每小时执行一次,以使用Oozie工作流从临时表插入到配置单元中的主表(ORC内部表)中。

Flow::: Data in temp table(External table) is populated hourly by using oozie. Below insert statement is executed hourly to insert from temp table to main table(internal table in ORC) in hive using Oozie workflow.

流摘要:::
Linux日志>>复制日志到临时表(外部配置单元表)>>插入主配置单元表。

Flow summary::: Linux logs >> copy logs in temp table(external hive table) >> insert in main hive table.

从临时表插入主表:::

Insert from temp table to main table:::

FROM
 temp
 INSERT INTO TABLE
 main
 PARTITION(event_date,service_id)
 SELECT
 from_unixtime(unix_timestamp(event_timestamp ,'yyyy-MM-dd HH'), 'yyyy-MM-dd HH:00:00.0'),
 col3,
 col4,
 "" as partner_nw_name,
 col5,
 "" as country_name,
 col6,
 col7,
 col8,
 col9,
 col10,
 col11,
 col12,
 col13,
 col14,
 col15,
 kpi_id,
 col18,
 col19,
 col20,
 col21,
 col23,
 col24,
 col25,
 from_unixtime(unix_timestamp(event_timestamp ,'yyyy-MM-dd HH'), 'yyyy-MM-dd') as event_date,
 service_id;

温度表:::

hive> desc temp;
OK
event_timestamp string
col2 int
col3 int
col4 int
col5 int
col6 string
col7 string
col8 string
col9 string
col10 string
col11 int
col12 int
col13 string
col14 string
col15 string
service_id int
kpi_id int
col18 bigint
col19 bigint
col20 bigint
col21 bigint
col22 double
col23 string
col24 int
col25 int
Time taken: 0.165 seconds, Fetched: 25 row(s)

主表:::

hive> desc main;
OK
event_timestamp timestamp
col3 int
col4 int
partner_nw_name string
col5 int
country_name string
col6 string
col7 string
col8 string
col9 string
col10 int
col11 int
col12 int
col13 string
col14 string
col15 string
kpi_id int
col18 bigint
col19 bigint
col20 bigint
col21 bigint
col23 double
col24 int
col25 int
event_date date
service_id int

# Partition Information
# col_name data_type comment

event_date date
service_id int
Time taken: 0.175 seconds, Fetched: 32 row(s)


推荐答案

如果您正在使用Hive以镶木地板格式写入数据,那么Hive会通过本地时区偏移量调整时间戳。有关更多信息,请通过以下链接。

If you are writing your data in parquet format using Hive then hive adjust the timestamp by local timezone offset. For more information please go through the below links.

有一张与Impala#2716的机票有关的Jira机票

Cloudera Impala时间戳文件在此处

这篇关于Hive 1.1中的时间戳记问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆