BigQuery:在标准SQL中使用重复的/数组STRUCT字段加入联接吗? [英] BigQuery: JOIN ON with repeated / array STRUCT field in Standard SQL?
问题描述
我基本上有两个表,分别是 Orders
和 Items
.由于这些表是从Google Cloud Datastore备份文件中导入的,因此引用不是通过简单的ID字段进行的,而是用于一对一关系的< STRUCT>
,其中其 id
字段表示我要匹配的实际唯一ID.对于一对多关系(REPEATED),该模式使用< STRUCT>
的ARRAY.
I have basically two tables, Orders
and Items
. As these tables are imported from Google Cloud Datastore backup files, references are not made by a simple ID field, but a <STRUCT>
for one-to-one relationship, where its id
field represents the actual unique ID I want to match. For one-to-many relationship (REPEATED), the schema uses ARRAY of <STRUCT>
.
我可以用LEFT OUTER JOIN查询一对一的关系,我也知道如何在一个非重复的结构和一个重复的字符串或int上进行联接,但是我很难用a来实现类似的联接查询重复的结构.
I can query the one-to-one relationships with a LEFT OUTER JOIN, I also know how to join on a non-repeated struct and a repeated string or int, but I have trouble to achieve a similar join query with a repeated struct.
一个订单包含一个一个项目:
#standardSQL
WITH Orders AS (
SELECT 1 AS __oid__, STRUCT(STRUCT(2 AS id, "default" AS ns) AS key) AS item UNION ALL
SELECT 2 AS __oid__, STRUCT(STRUCT(4 AS id, "default" AS ns) AS key) AS item UNION ALL
SELECT 3 AS __oid__, STRUCT(STRUCT(6 AS id, "default" AS ns) AS key) AS item
),
Items AS (
SELECT STRUCT(1 AS id, "default" AS ns) AS key, "#1.1" AS title UNION ALL
SELECT STRUCT(2 AS id, "default" AS ns) AS key, "#1.2" AS title UNION ALL
SELECT STRUCT(3 AS id, "default" AS ns) AS key, "#1.3" AS title UNION ALL
SELECT STRUCT(4 AS id, "default" AS ns) AS key, "#1.4" AS title UNION ALL
SELECT STRUCT(5 AS id, "default" AS ns) AS key, "#1.5" AS title UNION ALL
SELECT STRUCT(6 AS id, "default" AS ns) AS key, "#1.6" AS title
)
SELECT
__oid__
,Order_item AS item
FROM Orders
LEFT OUTER JOIN(
SELECT
key
,title
FROM Items
) Order_item
ON Order_item.key.id = item.key.id
结果(按预期工作):
+-----+---------+--------------+-------------+------------+
| Row | __oid__ | item.key.id | item.key.ns | item.title |
+-----+---------+--------------+-------------+------------+
| 1 | 1 | 2 | default | #1.2 |
+-----+---------+--------------+-------------+------------+
| 2 | 2 | 4 | default | #1.4 |
+-----+---------+--------------+-------------+------------+
| 3 | 3 | 6 | default | #1.6 |
+-----+---------+--------------+-------------+------------+
类似的查询,但是这次有一个很多项的订单:
Similar query, but this time one order with many items:
#standardSQL
WITH Orders AS (
SELECT 1 AS __oid__, ARRAY[STRUCT(STRUCT(1 AS id, "default" AS ns) AS key), STRUCT(STRUCT(2 AS id, "default" AS ns) AS key)] AS items UNION ALL
SELECT 2 AS __oid__, ARRAY[STRUCT(STRUCT(3 AS id, "default" AS ns) AS key), STRUCT(STRUCT(4 AS id, "default" AS ns) AS key)] AS items UNION ALL
SELECT 3 AS __oid__, ARRAY[STRUCT(STRUCT(5 AS id, "default" AS ns) AS key), STRUCT(STRUCT(6 AS id, "default" AS ns) AS key)] AS items
),
Items AS (
SELECT STRUCT(1 AS id, "default" AS ns) AS key, "#1.1" AS title UNION ALL
SELECT STRUCT(2 AS id, "default" AS ns) AS key, "#1.2" AS title UNION ALL
SELECT STRUCT(3 AS id, "default" AS ns) AS key, "#1.3" AS title UNION ALL
SELECT STRUCT(4 AS id, "default" AS ns) AS key, "#1.4" AS title UNION ALL
SELECT STRUCT(5 AS id, "default" AS ns) AS key, "#1.5" AS title UNION ALL
SELECT STRUCT(6 AS id, "default" AS ns) AS key, "#1.6" AS title
)
SELECT
__oid__
,Order_items AS items
FROM Orders
LEFT OUTER JOIN(
SELECT
key
,title
FROM Items
) Order_items
ON Order_items.key.id IN (SELECT item.key.id FROM UNNEST(items) AS item)
错误:连接谓词不支持IN子查询.
Error: IN subquery is not supported inside join predicate.
我实际上希望得到这个结果:
I actually expected this result:
+-----+---------+--------------+-------------+------------+
| Row | __oid__ | item.key.id | item.key.ns | item.title |
+-----+---------+--------------+-------------+------------+
| 1 | 1 | 1 | default | #1.1 |
| | | 2 | default | #1.2 |
+-----+---------+--------------+-------------+------------+
| 2 | 2 | 3 | default | #1.3 |
| | | 4 | default | #1.4 |
+-----+---------+--------------+-------------+------------+
| 3 | 3 | 5 | default | #1.5 |
| | | 6 | default | #1.6 |
+-----+---------+--------------+-------------+------------+
如何更改第二个查询以获得预期结果?
How do I change the second query to get the expected result?
推荐答案
替代方法是执行CROSS JOIN而不是LEFT JOIN
Alternative option is to do CROSS JOIN instead of LEFT JOIN
#standardSQL
WITH Orders AS (
SELECT 1 AS __oid__, ARRAY[STRUCT(STRUCT(1 AS id, "default" AS ns) AS key), STRUCT(STRUCT(2 AS id, "default" AS ns) AS key)] AS items UNION ALL
SELECT 2 AS __oid__, ARRAY[STRUCT(STRUCT(3 AS id, "default" AS ns) AS key), STRUCT(STRUCT(4 AS id, "default" AS ns) AS key)] AS items UNION ALL
SELECT 3 AS __oid__, ARRAY[STRUCT(STRUCT(5 AS id, "default" AS ns) AS key), STRUCT(STRUCT(6 AS id, "default" AS ns) AS key)] AS items
),
Items AS (
SELECT STRUCT(1 AS id, "default" AS ns) AS key, "#1.1" AS title UNION ALL
SELECT STRUCT(2 AS id, "default" AS ns) AS key, "#1.2" AS title UNION ALL
SELECT STRUCT(3 AS id, "default" AS ns) AS key, "#1.3" AS title UNION ALL
SELECT STRUCT(4 AS id, "default" AS ns) AS key, "#1.4" AS title UNION ALL
SELECT STRUCT(5 AS id, "default" AS ns) AS key, "#1.5" AS title UNION ALL
SELECT STRUCT(6 AS id, "default" AS ns) AS key, "#1.6" AS title
)
SELECT
__oid__
,ARRAY_AGG(Order_items) AS items
FROM Orders
CROSS JOIN(
SELECT
key
,title
FROM Items
) Order_items
WHERE Order_items.key.id IN (SELECT item.key.id FROM UNNEST(items) AS item)
GROUP BY __oid__
这篇关于BigQuery:在标准SQL中使用重复的/数组STRUCT字段加入联接吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!