重复的记录组以填充Google BigQuery中的多个日期间隔 [英] Duplicate groups of records to fill multiple date gaps in Google BigQuery
问题描述
我发现了一个类似的问题(重复记录以填充Google BigQuery中的日期之间的空白)),但是使用不同的方案并且不存在应答。 我有数据(这基本上是多种产品和合作伙伴的价格变化历史): 我需要的是一个查询(专门用BigQuery标准SQL编写),给定日期范围(在这种情况下, 基本上,每个产品和合作伙伴的组合都包含所有日期差距的价格历史记录。 我很难弄清楚如何完成这项工作,特别是如何在没有发生价格变化的同一日期生成多行。任何想法? I've found a similar question (Duplicating records to fill gap between dates in Google BigQuery), however with a different scenario and the answer there does not apply. I have data structured like so (which is basically price-change history for multiple products and partners): What I need is a query (specifically written in BigQuery Standard SQL) that, given a date range (in this case, Basically a price history with all date gaps filled, for every combination of product and partner. I'm having a hard time figuring out how to get this done, especially how to generate multiple rows for the same date where no price change has happened. Any ideas? Try below
这篇关于重复的记录组以填充Google BigQuery中的多个日期间隔的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
+ ------- ----- + --------- + --------- + ------- +
|日期|产品|伙伴|值|
+ ------------ + --------- + --------- + ------- +
| 2017-01-01 | a | x | 10 |
| 2017-01-01 | b | x | 15 |
| 2017-01-01 | a | y | 11 |
| 2017-01-01 | b | y | 16 |
| 2017-01-05 | b | x | 13 |
| 2017-01-07 | a | y | 15 |
| 2017-01-07 | a | x | 15 |
+ ------------ + --------- + --------- + ------- +
2017-01-01
至 2017-01-10
),输出以下结果:
+ -------------- + --------- + ------ --- + ------- +
|日期|产品|伙伴|值|
+ -------------- + --------- + --------- + ------- +
| 2017-01-01 | a | x | 10 |
| 2017-01-02 | a | x | 10 |
| 2017-01-03 | a | x | 10 |
| 2017-01-04 | a | x | 10 |
| 2017-01-05 | a | x | 10 |
| 2017-01-06 | a | x | 10 |
| 2017-01-07 | a | x | 15 |
| 2017-01-08 | a | x | 15 |
| 2017-01-09 | a | x | 15 |
| 2017-01-10 | a | x | 15 |
| 2017-01-01 | a | y | 11 |
| 2017-01-02 | a | y | 11 |
| 2017-01-03 | a | y | 11 |
| 2017-01-04 | a | y | 11 |
| 2017-01-05 | a | y | 11 |
| 2017-01-06 | a | y | 11 |
| 2017-01-07 | a | y | 15 |
| 2017-01-08 | a | y | 15 |
| 2017-01-09 | a | y | 15 |
| 2017-01-10 | a | y | 15 |
| 2017-01-01 | b | x | 15 |
| 2017-01-02 | b | x | 15 |
| 2017-01-03 | b | x | 15 |
| 2017-01-04 | b | x | 15 |
| 2017-01-05 | b | x | 13 |
| 2017-01-06 | b | x | 13 |
| 2017-01-07 | b | x | 13 |
| 2017-01-08 | b | x | 13 |
| 2017-01-09 | b | x | 13 |
| 2017-01-10 | b | x | 13 |
| 2017-01-01 | b | y | 16 |
| 2017-01-02 | b | y | 16 |
| 2017-01-03 | b | y | 16 |
| 2017-01-04 | b | y | 16 |
| 2017-01-05 | b | y | 16 |
| 2017-01-06 | b | y | 16 |
| 2017-01-07 | b | y | 16 |
| 2017-01-08 | b | y | 16 |
| 2017-01-09 | b | y | 16 |
| 2017-01-10 | b | y | 16 |
+ -------------- + --------- + --------- + ------- +
#standardSQL
WITH历史AS(
SELECT'2017-01-01'AS d,'a'AS产品, 'x'AS伙伴,10 AS值UNION ALL
SELECT'2017-01-01'AS d,'b'AS产品,'x'AS伙伴,15 AS值UNION ALL
SELECT'2017 -01-01'AS d,'a'AS产品,'y'AS伙伴,11 AS值UNION ALL
SELECT'2017-01-01'AS d,'b'AS产品,'y'AS伙伴,16 AS值UNION ALL
SELECT'2017-01-05'AS d,'b'AS产品,'x'AS伙伴,13 AS值UNION ALL
SELECT'2017-01-07 'AS d,'a'AS产品,'y'AS伙伴,15 AS值UNION ALL
SELECT'2017-01-07'AS d,'a'AS产品,'x'AS伙伴,15 AS价值
),
daterange AS(
SELECT date_in_range
FROM UNNEST(GENERATE_DATE_ARRAY('2017-01-01','2017-01- 10'))AS date_in_range
),
temp AS(
SELECT d,product,partner,value,LEAD(d)OVER(PARTITION BY product,partner ORDER BY d)AS next_d
FROM历史
ORDER BY产品,合作伙伴,d
)
SELECT date_in_range,产品,合作伙伴,价值
FROM daterange
JOIN temp
ON daterange.date_in_range> = PARSE_DATE('%Y-%m-%d',temp.d)
AND(daterange.date_in_range< PARSE_DATE('%Y-%m-%d',temp.next_d)或temp.next_d IS NULL)
ORDER BY product,partner,date_in_range
+------------+---------+---------+-------+
| date | product | partner | value |
+------------+---------+---------+-------+
| 2017-01-01 | a | x | 10 |
| 2017-01-01 | b | x | 15 |
| 2017-01-01 | a | y | 11 |
| 2017-01-01 | b | y | 16 |
| 2017-01-05 | b | x | 13 |
| 2017-01-07 | a | y | 15 |
| 2017-01-07 | a | x | 15 |
+------------+---------+---------+-------+
2017-01-01
to 2017-01-10
), outputs the following result:+--------------+---------+---------+-------+
| date | product | partner | value |
+--------------+---------+---------+-------+
| 2017-01-01 | a | x | 10 |
| 2017-01-02 | a | x | 10 |
| 2017-01-03 | a | x | 10 |
| 2017-01-04 | a | x | 10 |
| 2017-01-05 | a | x | 10 |
| 2017-01-06 | a | x | 10 |
| 2017-01-07 | a | x | 15 |
| 2017-01-08 | a | x | 15 |
| 2017-01-09 | a | x | 15 |
| 2017-01-10 | a | x | 15 |
| 2017-01-01 | a | y | 11 |
| 2017-01-02 | a | y | 11 |
| 2017-01-03 | a | y | 11 |
| 2017-01-04 | a | y | 11 |
| 2017-01-05 | a | y | 11 |
| 2017-01-06 | a | y | 11 |
| 2017-01-07 | a | y | 15 |
| 2017-01-08 | a | y | 15 |
| 2017-01-09 | a | y | 15 |
| 2017-01-10 | a | y | 15 |
| 2017-01-01 | b | x | 15 |
| 2017-01-02 | b | x | 15 |
| 2017-01-03 | b | x | 15 |
| 2017-01-04 | b | x | 15 |
| 2017-01-05 | b | x | 13 |
| 2017-01-06 | b | x | 13 |
| 2017-01-07 | b | x | 13 |
| 2017-01-08 | b | x | 13 |
| 2017-01-09 | b | x | 13 |
| 2017-01-10 | b | x | 13 |
| 2017-01-01 | b | y | 16 |
| 2017-01-02 | b | y | 16 |
| 2017-01-03 | b | y | 16 |
| 2017-01-04 | b | y | 16 |
| 2017-01-05 | b | y | 16 |
| 2017-01-06 | b | y | 16 |
| 2017-01-07 | b | y | 16 |
| 2017-01-08 | b | y | 16 |
| 2017-01-09 | b | y | 16 |
| 2017-01-10 | b | y | 16 |
+--------------+---------+---------+-------+
#standardSQL
WITH history AS (
SELECT '2017-01-01' AS d, 'a' AS product, 'x' AS partner, 10 AS value UNION ALL
SELECT '2017-01-01' AS d, 'b' AS product, 'x' AS partner, 15 AS value UNION ALL
SELECT '2017-01-01' AS d, 'a' AS product, 'y' AS partner, 11 AS value UNION ALL
SELECT '2017-01-01' AS d, 'b' AS product, 'y' AS partner, 16 AS value UNION ALL
SELECT '2017-01-05' AS d, 'b' AS product, 'x' AS partner, 13 AS value UNION ALL
SELECT '2017-01-07' AS d, 'a' AS product, 'y' AS partner, 15 AS value UNION ALL
SELECT '2017-01-07' AS d, 'a' AS product, 'x' AS partner, 15 AS value
),
daterange AS (
SELECT date_in_range
FROM UNNEST(GENERATE_DATE_ARRAY('2017-01-01', '2017-01-10')) AS date_in_range
),
temp AS (
SELECT d, product, partner, value, LEAD(d) OVER(PARTITION BY product, partner ORDER BY d) AS next_d
FROM history
ORDER BY product, partner, d
)
SELECT date_in_range, product, partner, value
FROM daterange
JOIN temp
ON daterange.date_in_range >= PARSE_DATE('%Y-%m-%d', temp.d)
AND (daterange.date_in_range < PARSE_DATE('%Y-%m-%d', temp.next_d) OR temp.next_d IS NULL)
ORDER BY product, partner, date_in_range