Hive查询在生成缺失日期时面临的问题 [英] Facing issue in Hive query in generating missing dates
问题描述
我有一个要求,我需要返回到一列的以前的值直到1000行,并为下一步获取以前的1000个日期,但是表中该列不存在所有这些1000个以前的日期。但是我需要那些缺少的日期才能从查询的输出中获取。
I have a requirement where I need to go back to previous values for a column until 1000 rows and get those previous 1000 dates for my next steps, but all those 1000 previous dates are not present for that column in the table. But I need those missing dates to get from output of the query.
当我尝试在查询下运行时,它不显示当前日期的1000个以前的日期值。
When I try to run below query it is not displaying 1000 previous date values from current date.
示例:假设日期列中只有2个日期
Example: let's say only 2 dates are available for date column
date
2019-01-16
2019-01-19
我来了查询返回1000个日期,但是它只给出最近的日期,因为所有先前的追溯日期都缺失
I have come up with a query to get back 1000 dates but it is giving only nearest date as all previous back dates are missing
SELECT date FROM table1 t
WHERE
date >= date_sub(current_date,1000) and dt<current_date ORDER BY date LIMIT 1
如果我在上面的查询中运行,它将显示 2019-01-16
,因为前1000天没有返回日期,因此给出的是最近的日期,这是 2019-01-16
,但我需要从 2016-04-23
开始的日期(从当前日期)到当前日期之前( 2019-01-18
)作为查询的输出。
If I run above query it is displaying 2019-01-16
, since previous 1000 days back date are not present it is giving nearest date ,which is 2019-01-16
but I need missing dates starting from 2016-04-23
(1000th date from current date) till before current date (2019-01-18
) as output of my query.
推荐答案
您可以在子查询中生成所需范围的日期(请参见 date_range
子查询(在下面的示例中),然后向左联接
与您的表。如果表中的某些日期没有记录,则该值将为null,从 date_range
子查询返回的日期将没有间隔。为所需的date_range设置开始日期
和结束日期
参数:
You can generate dates for required range in the subquery (see date_range
subquery in the example below) and left join
it with your table. If there is no record in your table on some dates, the value will be null, dates will be returned from the date_range
subquery without gaps. Set start_date
and end_date
parameters for date_range required:
set hivevar:start_date=2016-04-23; --replace with your start_date
set hivevar:end_date=current_date; --replace with your end_date
set hive.exec.parallel=true;
set hive.auto.convert.join=true; --this enables map-join
set hive.mapjoin.smalltable.filesize=25000000; --size of table to fit in memory
with date_range as
(--this query generates date range, check it's output
select date_add ('${hivevar:start_date}',s.i) as dt
from ( select posexplode(split(space(datediff(${hivevar:end_date},'${hivevar:start_date}')),' ')) as (i,x) ) s
)
select d.dt as date,
t.your_col --some value from your table on date
from date_range d
left join table1 t on d.dt=t.date
order by d.dt --order by dates if necessary
这篇关于Hive查询在生成缺失日期时面临的问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!