如何使用BigQuery链接子ID和父ID [英] How to link child and parent ids using BigQuery

查看:83
本文介绍了如何使用BigQuery链接子ID和父ID的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有下表:

child_id   parent_id 
1          2
2          3
3          -
4          5
5          6
6          -
7          8

,我想创建下表:

child_id   parent_id   branch_id
1          2           1
2          3           1
3          -           1
4          5           2
5          6           2
6          -           2
7          8           3

,其中branch_id表示通过parent_ids链接在一起的分组.

in which the branch_id denotes groupings that are linked together by the parent_ids.

但是,不能保证行顺序 ,并且分支可能包含数百行.这排除了LAG()函数的简单使用.

However, the row order is not guaranteed and branches may contain hundreds of rows. This rules out a simple use of the LAG() function.

鉴于BigQuery的SQL的局限性,我该如何实现?

How can I achieve this given the limitations of BigQuery's SQL?

推荐答案

以下示例如何在这种情况下使用BigQuery脚本

Below example of how to employ BigQuery Scripting for such cases

DECLARE rows_count, run_away_stop INT64 DEFAULT 0;

CREATE TEMP TABLE input AS 
  SELECT 1 child_id, 2 parent_id UNION ALL
  SELECT 2, 3 UNION ALL
  SELECT 3, NULL UNION ALL
  SELECT 4, 5 UNION ALL
  SELECT 5, 6 UNION ALL
  SELECT 6, NULL UNION ALL
  SELECT 7, 8 ;

CREATE TEMP TABLE ttt AS 
SELECT ARRAY(SELECT val FROM UNNEST([child_id, IFNULL(parent_id, child_id)]) val ORDER BY val ) arr FROM input;

LOOP
  SET (run_away_stop, rows_count) = (SELECT AS STRUCT run_away_stop + 1, COUNT(1) FROM ttt);

  CREATE OR REPLACE TEMP TABLE ttt AS
  SELECT ANY_VALUE(arr) arr FROM (
    SELECT ARRAY(SELECT DISTINCT val FROM UNNEST(arr) val ORDER BY val) arr
    FROM (
      SELECT ANY_VALUE(arr1) arr1, ARRAY_CONCAT_AGG(arr) arr    
      FROM (
        SELECT t1.arr arr1, t2.arr arr2, ARRAY(SELECT DISTINCT val FROM UNNEST(ARRAY_CONCAT( t1.arr, t2.arr)) val ORDER BY val) arr 
        FROM ttt t1, ttt t2 
        WHERE (SELECT COUNT(1) FROM UNNEST(t1.arr) val JOIN UNNEST(t2.arr) val USING(val)) > 0
      ) GROUP BY FORMAT('%t', arr1)
    )
  ) GROUP BY FORMAT('%t', arr);

  IF (rows_count = (SELECT COUNT(1) FROM ttt) AND run_away_stop > 1) OR run_away_stop > 10 THEN BREAK; END IF;
END LOOP;

SELECT input.*, branch_id
FROM input 
JOIN (
  SELECT ROW_NUMBER() OVER() AS branch_id, arr AS IDs FROM ttt
)
ON child_id IN UNNEST(IDs)

最终输出

Row child_id    parent_id   branch_id    
1   7           8           1    
2   4           5           2    
3   6           null        2    
4   5           6           2    
5   3           null        3    
6   2           3           3    
7   1           2           3    

这篇关于如何使用BigQuery链接子ID和父ID的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆