Hive Data 根据时间戳选择最新值 [英] Hive Data selecting latest value based on timestamp
问题描述
我有一个包含以下列的表格.
I have a table having the following columns.
C1,C2,Process TimeStamp,InsertDateTimeStamp
p1,v1,2014-01-30 12:15:23,2013-10-01 05:34:23
p1,v2,2014-01-31 05:11:34,2013-12-01 06:12:31
p1,v3,2014-01-31 07:16:05,2012-09-01 07:45:20
p2,v4,2014-02-01 09:22:52,2013-12-01 06:12:31
p2,v5,2014-02-01 09:22:52,2012-09-01 07:45:20
现在,我想根据最新的 Process TimeStamp
为每个主键获取唯一的行.
Now, I want to fetch unique row for each primary key based on latest Process TimeStamp
.
如果 Process TimeStamp
相同,则应选择具有最新 InsertDateTimeStamp
的行.
If Process TimeStamp
is same then row having latest InsertDateTimeStamp
should be chosen.
所以,我的结果应该是.
So, my result should be.
p1,v3,2014-01-31 07:16:05,2012-09-01 07:45:20
p2,v4,2014-02-01 09:22:52,2013-12-01 06:12:31
如何通过 HiveQL 实现这一点?
How to achieve this via HiveQL ?
我目前使用的是 hive 0.10.我不能在 IN 或 EXISTS 中使用子查询.
I am currently using hive 0.10. I can not use subquery with IN or EXISTS.
谢谢.
推荐答案
select C1, s.C2, s.ProcessTimeStamp, s.InsertDateTimeStamp from (
select C1, max(named_struct('unixtime', unix_timestamp(ProcessTimeStamp, 'yyyy-MM-dd HH:mm:ss'), 'C2', C2, 'ProcessTimeStamp', ProcessTimeStamp, 'InsertDateTimeStamp', InsertDateTimeStamp)) as s
from my_table group by C1
) t;
做一个结构的最大值比较第一个字段,然后是第二个字段,等等.所以如果你把所有东西都构造在一起,首先是解析的时间戳值,你会得到一个表示该行的结构.然后通过选择单个字段来取消它的结构.
Doing the max of a struct compares by the first field, then the second field, etc. So if you struct everything together, with the parsed timestamp value first, you get a struct representing that row. Then just un-struct it by selecting out the individual fields.
这篇关于Hive Data 根据时间戳选择最新值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!