如何在BigQuery中填写缺失的日期? [英] How to fill missing dates in BigQuery?

查看:70
本文介绍了如何在BigQuery中填写缺失的日期?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

此问题与>如何填写缺失内容有关日期和分区数据中的值?,但是由于该解决方案不适用于BigQuery,因此我将再次发布该问题.

This question is related to How to fill missing dates and values in partitioned data?, but since the solution doesn't work for BigQuery, I'm posting the question again.

我有以下假设表:

name       date          val
-------------------------------
A          01/01/2020     1.5
A          01/03/2020     2
A          01/06/2020     5
B          01/02/2020     90
B          01/07/2020     10

我想填写空白之间的日期,并复制下一个最近日期的值.此外,我想填写以下日期:1)返回到预设的MINDATE(假设是12/29/2019)和2)直到当前日期(假设是01/09/2020)-对于2),默认值为1.

I want to fill in the dates in between the gaps and copy over the value from the most recent following date. In addition, I would like to fill in dates that 1) go back to a pre-set MINDATE (let's say it's 12/29/2019) and 2) go up to the current date (let's say it's 01/09/2020) - and for 2) the default values will be 1.

因此,输出将是:

name       date          val
-------------------------------
A          12/29/2019     1.5
A          12/30/2019     1.5
A          12/31/2019     1.5
A          01/01/2020     1.5   <- original
A          01/02/2020     2
A          01/03/2020     2     <- original
A          01/04/2020     5
A          01/05/2020     5
A          01/06/2020     5     <- original
A          01/07/2020     1
A          01/08/2020     1
A          01/09/2020     1
B          12/29/2019     90
B          12/30/2019     90
B          12/31/2019     90
B          01/01/2020     90
B          01/02/2020     90    <- original
B          01/03/2020     10
B          01/04/2020     10
B          01/05/2020     10
B          01/06/2020     10
B          01/07/2020     10    <- original
B          01/08/2020     1
B          01/09/2020     1

上述问题中接受的解决方案在BigQuery中不起作用.

The accepted solution in the above question doesn't work in BigQuery.

推荐答案

这应该有效

with base as (

select 'A' as name,           '01/01/2020' as date,     1.5 as val  union all
select 'A' as name,           '01/03/2020' as date,     2 as val union all
select 'A' as name,           '01/06/2020' as date,     5 as val union all
select 'B' as name,           '01/02/2020' as date,     90 as val union all
select 'B' as name,           '01/07/2020' as date,     10 as val
),

missing_dates as (

select name,dates as date from 
UNNEST(GENERATE_DATE_ARRAY('2019-12-29', '2020-01-09', INTERVAL 1 DAY)) AS dates cross join (select distinct name from base)

), joined as (
select distinct missing_dates.name, missing_dates.date,val 
from  missing_dates 
left join base on missing_dates.name = base.name 
and  parse_date('%m/%d/%Y', base.date) = missing_dates.date

)

select * except(val), 
ifnull(first_value(val ignore nulls) over(partition by name order by date ROWS BETWEEN CURRENT ROW AND
    UNBOUNDED FOLLOWING),1) as va1 
from joined

这篇关于如何在BigQuery中填写缺失的日期?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆