连接来自许多列的不均匀数组,并避免BigQuery中的重复项 [英] Join uneven arrays from many columns and avoid duplicates in BigQuery

查看:60
本文介绍了连接来自许多列的不均匀数组,并避免BigQuery中的重复项的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在此处问了一个类似的问题,我认为不幸的是,我的问题足够抽象,但事实并非如此.

I asked a similar question here that I thought abstracted my problem sufficiently but unfortunately, it did not.

我有一个嵌套数组表,第一列是一个整数.我可以不重复地连接两个数组(正如我之前的问题中回答的那样),但是我不确定如何使用两个以上的数组.

I have a table of nested arrays, the first column is an int. I can join two arrays without duplication (as answered in my previous question) but I'm unsure how to do it with more than two.

这是表(在StandardSQL中):

Here is the table (in StandardSQL):

WITH
  a AS (
  SELECT 
    1 AS col1,
    ARRAY[1, 2 ] AS col2,
    ARRAY[1, 2, 3] AS col3,
    ARRAY[1, 2, 3, 4] AS col4
  UNION ALL
  SELECT
    2 AS col1, 
    ARRAY[1, 2, 2] AS col2,
    ARRAY[1, 2, 3] AS col3,
    ARRAY[1, 2, 3, 4] AS col4
  UNION ALL
  SELECT
    3 AS col1,
    ARRAY[2, 2 ] AS col2,
    ARRAY[1, 2, 3] AS col3,
    ARRAY[1, 2, 3, 4] AS col4
    )
SELECT
  *
FROM
  a

产生:

+-------++--------++--------++---------+
| col1   |   col2  |   col3  |   col4  |
+-------++--------++--------++---------+
|   1    |   1     |   1     |   1     |
|        |   2     |   2     |   2     |
|        |         |   3     |   3     |
|        |         |         |   4     |
|   2    |   1     |   1     |   1     |
|        |   2     |   2     |   2     |
|        |         |   3     |   3     |
|        |         |         |   4     |
|   3    |   1     |   1     |   1     |
|        |   2     |   2     |   2     |
|        |         |   3     |   3     |
|        |         |         |   4     |
+-------++--------++--------++---------+

但是我要找的是这个

+-------++--------++--------++---------+
| col1   |   col2  |   col3  |   col4  |
+-------++--------++--------++---------+
|   1    |   1     |   1     |   1     |
|  null  |   2     |   2     |   2     |
|  null  |  null   |   3     |   3     |
|  null  |  null   |  null   |   4     |
|   2    |   1     |   1     |   1     |
|  null  |   2     |   2     |   2     |
|  null  |  null   |   3     |   3     |
|  null  |  null   |  null   |   4     |
|   3    |   1     |   1     |   1     |
|  null  |   2     |   2     |   2     |
|  null  |  null   |   3     |   3     |
|  null  |  null   |  null   |   4     |
+-------++--------++--------++---------+

这是我取消嵌套许多列的方式:

Here is how I'm unnesting the many columns:

SELECT
  col1,
  _col2,
  _col3
FROM
  a left join 
  unnest(col2) as _col2 
  left join unnest(col3) as _col3

生成此表:

+-------++--------++--------+
| col1   |   col2  |   col3 |
+-------++--------++--------+
|   1    |   1     |   1    |
|   1    |   1     |   2    |
|   1    |   1     |   3    |
|   1    |   2     |   1    |
|   1    |   2     |   2    |
|   1    |   2     |   3    |
|   2    |   1     |   1    |
|   2    |   1     |   2    |
|   2    |   1     |   3    |
|   2    |   2     |   1    |
|   2    |   2     |   2    |
|   2    |   2     |   3    |
...
...
...
+-------++--------++--------++

推荐答案

我不完全了解您的结果与输入数据之间的关系.所有 col1 值的结果完全相同,但是输入不同.

I don't fully understand how your results relate to the input data. The results for all the col1 values are exactly the same, but the inputs are different.

也就是说,我可以将其解释为您先前问题的扩展.这可能就是您想要的:

That said, I can interpret this as an extension of your previous question. This may be what you want:

SELECT a.col1, c2, c3, c4
FROM (select a.*,
             (SELECT ARRAY_AGG(DISTINCT c) cs
              from unnest(array_concat( col2, col3, col4)) c
             ) cs
      from a 
     ) a cross join
     unnest(cs) c left join      
     unnest(a.col2) c2
     on c2 = c left join
     unnest(a.col3) c3
     on c3 = c left join
     unnest(a.col4) c4
     on c4 = c;

a 的初始子查询生成数组中的所有值.然后将其用于左连接.

The initial subquery for a generates all the values in the arrays. This is then used for a left join.

这篇关于连接来自许多列的不均匀数组,并避免BigQuery中的重复项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆