在BigQuery中忽略空值合并行 [英] Combine Rows in BigQuery Ignoring Nulls

查看:84
本文介绍了在BigQuery中忽略空值合并行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个如下所示的Google BigQuery表:

I have a Google BigQuery table that looks like this:

║ id ║   col_1    ║  col_2  ║ updated ║

║  1 ║ first_data ║ null    ║ 4/22    ║

║  1 ║ null       ║ old     ║ 4/23    ║

║  1 ║ null       ║ correct ║ 4/24    ║

我想构造一个查询,该查询将这些行和覆盖"空列组合在一起,如果存在具有相同ID且该列不为空的行.本质上,结果应如下所示:

I would like to construct a query that combines these rows and "overwrites" null columns if there is a row with the same id with the column not null. Essentially the result should look like:

║  1 ║ first_data ║ correct ║ 4/24    ║

如果可能的话,我也希望结果代表历史:

If possible I would also like the result to represent a history:

║  1 ║ first_data ║ old     ║ 4/23    ║

║  1 ║ first_data ║ correct ║ 4/24    ║

但这是次要的,不是必需的.

But that is secondary and not necessary.

推荐答案

以下是BigQuery标准SQL

Below is for BigQuery Standard SQL

#standardSQL
SELECT id, 
  IFNULL(col_1, FIRST_VALUE(col_1 IGNORE NULLS) OVER(win)) col_1, 
  IFNULL(col_2, FIRST_VALUE(col_2 IGNORE NULLS) OVER(win)) col_2, 
  updated
FROM `project.dataset.your_table`
WINDOW win AS (PARTITION BY id ORDER BY updated DESC 
               ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING)
-- ORDER BY id, updated

您可以使用以下虚拟数据来测试/玩游戏

You can test / play with it using dummy data as below

#standardSQL
WITH `project.dataset.your_table` AS (
  SELECT 1 id, 'first_data' col_1, NULL col_2,  '4/22' updated UNION ALL
  SELECT 1,     NULL,             'old',        '4/23'         UNION ALL
  SELECT 1,     NULL,             'correct',    '4/24'         UNION ALL
  SELECT 1,    'next_data',       NULL,         '4/25'         UNION ALL
  SELECT 1,     NULL,             NULL,         '4/26'         
)
SELECT id, 
  IFNULL(col_1, FIRST_VALUE(col_1 IGNORE NULLS) OVER(win)) col_1, 
  IFNULL(col_2, FIRST_VALUE(col_2 IGNORE NULLS) OVER(win)) col_2, 
  updated
FROM `project.dataset.your_table`
WINDOW win AS (PARTITION BY id ORDER BY updated DESC 
               ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING)
ORDER BY id, updated

有结果

Row id  col_1       col_2   updated  
1   1   first_data  null    4/22     
2   1   first_data  old     4/23     
3   1   first_data  correct 4/24     
4   1   next_data   correct 4/25     
5   1   next_data   correct 4/26     

这篇关于在BigQuery中忽略空值合并行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆