如何在BigQuery中填写缺失的日期? [英] How to fill missing dates in BigQuery?
问题描述
此问题与>如何填写缺失内容有关日期和分区数据中的值?,但是由于该解决方案不适用于BigQuery,因此我将再次发布该问题.
This question is related to How to fill missing dates and values in partitioned data?, but since the solution doesn't work for BigQuery, I'm posting the question again.
我有以下假设表:
name date val
-------------------------------
A 01/01/2020 1.5
A 01/03/2020 2
A 01/06/2020 5
B 01/02/2020 90
B 01/07/2020 10
我想填写空白之间的日期,并复制下一个最近日期的值.此外,我想填写以下日期:1)返回到预设的MINDATE(假设是12/29/2019)和2)直到当前日期(假设是01/09/2020)-对于2),默认值为1.
I want to fill in the dates in between the gaps and copy over the value from the most recent following date. In addition, I would like to fill in dates that 1) go back to a pre-set MINDATE (let's say it's 12/29/2019) and 2) go up to the current date (let's say it's 01/09/2020) - and for 2) the default values will be 1.
因此,输出将是:
name date val
-------------------------------
A 12/29/2019 1.5
A 12/30/2019 1.5
A 12/31/2019 1.5
A 01/01/2020 1.5 <- original
A 01/02/2020 2
A 01/03/2020 2 <- original
A 01/04/2020 5
A 01/05/2020 5
A 01/06/2020 5 <- original
A 01/07/2020 1
A 01/08/2020 1
A 01/09/2020 1
B 12/29/2019 90
B 12/30/2019 90
B 12/31/2019 90
B 01/01/2020 90
B 01/02/2020 90 <- original
B 01/03/2020 10
B 01/04/2020 10
B 01/05/2020 10
B 01/06/2020 10
B 01/07/2020 10 <- original
B 01/08/2020 1
B 01/09/2020 1
上述问题中接受的解决方案在BigQuery中不起作用.
The accepted solution in the above question doesn't work in BigQuery.
推荐答案
这应该有效
with base as (
select 'A' as name, '01/01/2020' as date, 1.5 as val union all
select 'A' as name, '01/03/2020' as date, 2 as val union all
select 'A' as name, '01/06/2020' as date, 5 as val union all
select 'B' as name, '01/02/2020' as date, 90 as val union all
select 'B' as name, '01/07/2020' as date, 10 as val
),
missing_dates as (
select name,dates as date from
UNNEST(GENERATE_DATE_ARRAY('2019-12-29', '2020-01-09', INTERVAL 1 DAY)) AS dates cross join (select distinct name from base)
), joined as (
select distinct missing_dates.name, missing_dates.date,val
from missing_dates
left join base on missing_dates.name = base.name
and parse_date('%m/%d/%Y', base.date) = missing_dates.date
)
select * except(val),
ifnull(first_value(val ignore nulls) over(partition by name order by date ROWS BETWEEN CURRENT ROW AND
UNBOUNDED FOLLOWING),1) as va1
from joined
这篇关于如何在BigQuery中填写缺失的日期?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!