连接来自许多列的不均匀数组,并避免BigQuery中的重复项 [英] Join uneven arrays from many columns and avoid duplicates in BigQuery
问题描述
我在此处问了一个类似的问题,我认为不幸的是,我的问题足够抽象,但事实并非如此.
I asked a similar question here that I thought abstracted my problem sufficiently but unfortunately, it did not.
我有一个嵌套数组表,第一列是一个整数.我可以不重复地连接两个数组(正如我之前的问题中回答的那样),但是我不确定如何使用两个以上的数组.
I have a table of nested arrays, the first column is an int. I can join two arrays without duplication (as answered in my previous question) but I'm unsure how to do it with more than two.
这是表(在StandardSQL中):
Here is the table (in StandardSQL):
WITH
a AS (
SELECT
1 AS col1,
ARRAY[1, 2 ] AS col2,
ARRAY[1, 2, 3] AS col3,
ARRAY[1, 2, 3, 4] AS col4
UNION ALL
SELECT
2 AS col1,
ARRAY[1, 2, 2] AS col2,
ARRAY[1, 2, 3] AS col3,
ARRAY[1, 2, 3, 4] AS col4
UNION ALL
SELECT
3 AS col1,
ARRAY[2, 2 ] AS col2,
ARRAY[1, 2, 3] AS col3,
ARRAY[1, 2, 3, 4] AS col4
)
SELECT
*
FROM
a
产生:
+-------++--------++--------++---------+
| col1 | col2 | col3 | col4 |
+-------++--------++--------++---------+
| 1 | 1 | 1 | 1 |
| | 2 | 2 | 2 |
| | | 3 | 3 |
| | | | 4 |
| 2 | 1 | 1 | 1 |
| | 2 | 2 | 2 |
| | | 3 | 3 |
| | | | 4 |
| 3 | 1 | 1 | 1 |
| | 2 | 2 | 2 |
| | | 3 | 3 |
| | | | 4 |
+-------++--------++--------++---------+
但是我要找的是这个
+-------++--------++--------++---------+
| col1 | col2 | col3 | col4 |
+-------++--------++--------++---------+
| 1 | 1 | 1 | 1 |
| null | 2 | 2 | 2 |
| null | null | 3 | 3 |
| null | null | null | 4 |
| 2 | 1 | 1 | 1 |
| null | 2 | 2 | 2 |
| null | null | 3 | 3 |
| null | null | null | 4 |
| 3 | 1 | 1 | 1 |
| null | 2 | 2 | 2 |
| null | null | 3 | 3 |
| null | null | null | 4 |
+-------++--------++--------++---------+
这是我取消嵌套许多列的方式:
Here is how I'm unnesting the many columns:
SELECT
col1,
_col2,
_col3
FROM
a left join
unnest(col2) as _col2
left join unnest(col3) as _col3
生成此表:
+-------++--------++--------+
| col1 | col2 | col3 |
+-------++--------++--------+
| 1 | 1 | 1 |
| 1 | 1 | 2 |
| 1 | 1 | 3 |
| 1 | 2 | 1 |
| 1 | 2 | 2 |
| 1 | 2 | 3 |
| 2 | 1 | 1 |
| 2 | 1 | 2 |
| 2 | 1 | 3 |
| 2 | 2 | 1 |
| 2 | 2 | 2 |
| 2 | 2 | 3 |
...
...
...
+-------++--------++--------++
推荐答案
我不完全了解您的结果与输入数据之间的关系.所有 col1
值的结果完全相同,但是输入不同.
I don't fully understand how your results relate to the input data. The results for all the col1
values are exactly the same, but the inputs are different.
也就是说,我可以将其解释为您先前问题的扩展.这可能就是您想要的:
That said, I can interpret this as an extension of your previous question. This may be what you want:
SELECT a.col1, c2, c3, c4
FROM (select a.*,
(SELECT ARRAY_AGG(DISTINCT c) cs
from unnest(array_concat( col2, col3, col4)) c
) cs
from a
) a cross join
unnest(cs) c left join
unnest(a.col2) c2
on c2 = c left join
unnest(a.col3) c3
on c3 = c left join
unnest(a.col4) c4
on c4 = c;
a
的初始子查询生成数组中的所有值.然后将其用于左连接
.
The initial subquery for a
generates all the values in the arrays. This is then used for a left join
.
这篇关于连接来自许多列的不均匀数组,并避免BigQuery中的重复项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!