Hive查询在生成缺失日期时面临的问题 [英] Facing issue in Hive query in generating missing dates

查看:683
本文介绍了Hive查询在生成缺失日期时面临的问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个要求,我需要返回到一列的以前的值直到1000行,并为下一步获取以前的1000个日期,但是表中该列不存在所有这些1000个以前的日期。但是我需要那些缺少的日期才能从查询的输出中获取。

I have a requirement where I need to go back to previous values for a column until 1000 rows and get those previous 1000 dates for my next steps, but all those 1000 previous dates are not present for that column in the table. But I need those missing dates to get from output of the query.

当我尝试在查询下运行时,它不显示当前日期的1000个以前的日期值。

When I try to run below query it is not displaying 1000 previous date values from current date.

示例:假设日期列中只有2个日期

Example: let's say only 2 dates are available for date column

date      
2019-01-16 
2019-01-19

我来了查询返回1000个日期,但是它只给出最近的日期,因为所有先前的追溯日期都缺失

I have come up with a query to get back 1000 dates but it is giving only nearest date as all previous back dates are missing

SELECT date FROM  table1 t
WHERE 
date >= date_sub(current_date,1000) and  dt<current_date ORDER BY date LIMIT 1

如果我在上面的查询中运行,它将显示 2019-01-16 ,因为前1000天没有返回日期,因此给出的是最近的日期,这是 2019-01-16 ,但我需要从 2016-04-23 开始的日期(从当前日期)到当前日期之前( 2019-01-18 )作为查询的输出。

If I run above query it is displaying 2019-01-16, since previous 1000 days back date are not present it is giving nearest date ,which is 2019-01-16 but I need missing dates starting from 2016-04-23 (1000th date from current date) till before current date (2019-01-18) as output of my query.

推荐答案

您可以在子查询中生成所需范围的日期(请参见 date_range 子查询(在下面的示例中),然后向左联接与您的表。如果表中的某些日期没有记录,则该值将为null,从 date_range 子查询返回的日期将没有间隔。为所需的date_range设置开始日期结束日期参数:

You can generate dates for required range in the subquery (see date_range subquery in the example below) and left join it with your table. If there is no record in your table on some dates, the value will be null, dates will be returned from the date_range subquery without gaps. Set start_date and end_date parameters for date_range required:

set hivevar:start_date=2016-04-23; --replace with your start_date
set hivevar:end_date=current_date; --replace with your end_date

set hive.exec.parallel=true;
set hive.auto.convert.join=true; --this enables map-join
set hive.mapjoin.smalltable.filesize=25000000; --size of table to fit in memory

with date_range as 
(--this query generates date range, check it's output
select date_add ('${hivevar:start_date}',s.i) as dt 
  from ( select posexplode(split(space(datediff(${hivevar:end_date},'${hivevar:start_date}')),' ')) as (i,x) ) s
) 

select d.dt as date,
       t.your_col --some value from your table on date
  from date_range d 
       left join table1 t on d.dt=t.date 
order by d.dt --order by dates if necessary

这篇关于Hive查询在生成缺失日期时面临的问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆