用最接近的值填充表中缺少的日期值的配置单元SQL查询 [英] Hive SQL query to fill missing date values in table with nearest values

查看:26
本文介绍了用最接近的值填充表中缺少的日期值的配置单元SQL查询的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我花了几天时间试图弄清楚如何在配置单元中将丢失的日期与最接近的值相加,但没有运气。我需要基于环境约束使用配置单元SQL来实现这一点。原始表当前类似于下表。

account name,available balance,Date of balance 

Peter,50000,2021-05-24
Peter,50035,2021-05-25
Peter,50035,2021-05-26
Peter,50610,2021-05-28
Peter,51710,2021-06-01
Peter,53028.1,2021-06-02
Peter,53916.1,2021-06-03
Mary,50000,2021-05-24
Mary,50035,2021-05-25
Mary,53028.1,2021-05-30

Raw balance table

我需要的是将上面的表格转换为以下链接中的表格:

account name,available balance,Date of balance 

Peter,50000,2021-05-24
Peter,50035,2021-05-25
Peter,50035,2021-05-26
Peter,50035,2021-05-27
Peter,50610,2021-05-28
Peter,50610,2021-05-29
Peter,50610,2021-05-30
Mary,50000,2021-05-24
Mary,50035,2021-05-25
Mary,50035,2021-05-26
Mary,50035,2021-05-27
Mary,50035,2021-05-28
Mary,50035,2021-05-29
Mary,53028.1,2021-05-30

Converted table

请任何人分享配置单元SQL逻辑以进行此更改?

推荐答案

使用Lead()函数获取下一个日期,计算天数差异,获取长度为Diff in Days的空格字符串,拆分,使用POSE EXPLETDE生成行,使用Position to Add to Date获取缺少的日期:

with mytable as (--Demo dataset, use your table instead of this
select stack(10, --number of tuples
'Peter',float(50000),'2021-05-24',
'Peter',float(50035),'2021-05-25',
'Peter',float(50035),'2021-05-26',
'Peter',float(50610),'2021-05-28',
'Peter',float(51710),'2021-06-01',
'Peter',float(53028.1),'2021-06-02',
'Peter',float(53916.1),'2021-06-03',
'Mary',float(50000),'2021-05-24',
'Mary',float(50035),'2021-05-25',
'Mary',float(53028.1),'2021-05-30'
) as (account_name,available_balance,Date_of_balance)
) --use your table instead of this CTE

select  account_name, available_balance, date_add(Date_of_balance,e.i) as Date_of_balance
from
( --Get next_date to generate date range
select account_name,available_balance,Date_of_balance,
       lead(Date_of_balance,1, Date_of_balance) over (partition by account_name order by Date_of_balance) next_date    
  from mytable d  --use your table
) s lateral view outer posexplode(split(space(datediff(next_date,Date_of_balance)-1),'')) e as i,x --generate rows
order by account_name desc, Date_of_balance --this is to have order of rows like in your Converted Table

结果:

account_name    available_balance   date_of_balance 
Peter           50000                2021-05-24
Peter           50035                2021-05-25
Peter           50035                2021-05-26
Peter           50035                2021-05-27
Peter           50610                2021-05-28
Peter           50610                2021-05-29
Peter           50610                2021-05-30
Peter           50610                2021-05-31
Peter           51710                2021-06-01
Peter           53028.1              2021-06-02
Peter           53916.1              2021-06-03
Mary            50000                2021-05-24
Mary            50035                2021-05-25
Mary            50035                2021-05-26
Mary            50035                2021-05-27
Mary            50035                2021-05-28
Mary            50035                2021-05-29
Mary            53028.1              2021-05-30

这篇关于用最接近的值填充表中缺少的日期值的配置单元SQL查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆