在 Hive 查询中生成缺失日期时面临问题 [英] Facing issue in Hive query in generating missing dates

查看:38
本文介绍了在 Hive 查询中生成缺失日期时面临问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个要求,我需要返回到一列的先前值直到 1000 行,并为我的后续步骤获取先前的 1000 个日期,但是表中该列不存在所有先前的 1000 个日期.但我需要那些缺失的日期才能从查询的输出中获取.

I have a requirement where I need to go back to previous values for a column until 1000 rows and get those previous 1000 dates for my next steps, but all those 1000 previous dates are not present for that column in the table. But I need those missing dates to get from output of the query.

当我尝试在查询下方运行时,它不显示当前日期的 1000 个先前日期值.

When I try to run below query it is not displaying 1000 previous date values from current date.

示例:假设只有 2 个日期可用于日期列

Example: let's say only 2 dates are available for date column

date      
2019-01-16 
2019-01-19

我想出了一个查询来获取 1000 个日期,但它只给出最近的日期,因为所有以前的日期都丢失了

I have come up with a query to get back 1000 dates but it is giving only nearest date as all previous back dates are missing

SELECT date FROM  table1 t
WHERE 
date >= date_sub(current_date,1000) and  dt<current_date ORDER BY date LIMIT 1

如果我运行上面的查询,它会显示 2019-01-16,因为前 1000 天的回溯日期不存在,它给出最近的日期,即 2019-01-16 但我需要从 2016-04-23(从当前日期算起的第 1000 个日期)到当前日期之前(2019-01-18)作为输出的缺失日期我的查询.

If I run above query it is displaying 2019-01-16, since previous 1000 days back date are not present it is giving nearest date ,which is 2019-01-16 but I need missing dates starting from 2016-04-23 (1000th date from current date) till before current date (2019-01-18) as output of my query.

推荐答案

您可以在子查询中生成所需范围的日期(请参阅下面示例中的 date_range 子查询)和 left join 它与您的桌子.如果您的表中某些日期没有记录,则该值将为空,日期将从 date_range 子查询中无间隙地返回.为所需的 date_range 设置 start_dateend_date 参数:

You can generate dates for required range in the subquery (see date_range subquery in the example below) and left join it with your table. If there is no record in your table on some dates, the value will be null, dates will be returned from the date_range subquery without gaps. Set start_date and end_date parameters for date_range required:

set hivevar:start_date=2016-04-23; --replace with your start_date
set hivevar:end_date=current_date; --replace with your end_date

set hive.exec.parallel=true;
set hive.auto.convert.join=true; --this enables map-join
set hive.mapjoin.smalltable.filesize=25000000; --size of table to fit in memory

with date_range as 
(--this query generates date range, check it's output
select date_add ('${hivevar:start_date}',s.i) as dt 
  from ( select posexplode(split(space(datediff('${hivevar:end_date}','${hivevar:start_date}')),' ')) as (i,x) ) s
) 

select d.dt as date,
       t.your_col --some value from your table on date
  from date_range d 
       left join table1 t on d.dt=t.date 
order by d.dt --order by dates if necessary

这篇关于在 Hive 查询中生成缺失日期时面临问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆