BigQuery:使用标准 SQL 过滤重复字段 [英] BigQuery : filter repeated fields with standard SQL
问题描述
我有下表:
row | query_params | query_values
1 foo bar
param val
2 foo baz
JSON:
{
"query_params" : [ "foo", "param"],
"query_values" : [ "bar", "val" ]
}, {
"query_params" : [ "foo" ],
"query_values" : [ "baz" ]
}
使用标准 SQL 我想过滤重复字段的值,比如
Using standard SQL I want to filter repeated field on their value, something like
SELECT * FROM table WHERE query_params = 'foo'
哪个会输出
row | query_params | query_values
1 foo bar
2 foo baz
PS:对于使用旧版 SQL 的相同问题,请参阅此处
PS : for the same question using legacy SQL, see here
推荐答案
您是否在 过滤重复字段的差异?使用您的示例数据作为基础,并假设参数和值一起重复(而不是单独的数组),您可以编写如下查询:
Have you seen the topic in the migration guide on differences in filtering repeated fields? Using your sample data as a basis, and assuming that the params and values repeat together (as opposed to being separate arrays), you can write a query such as:
WITH T AS (
SELECT 1 AS row, ARRAY<STRUCT<param STRING, value STRING>>[
('foo', 'bar'), ('param', 'val')] AS queries UNION ALL
SELECT 2, ARRAY<STRUCT<param STRING, value STRING>>[('foo', 'baz')]
)
SELECT * EXCEPT (queries)
FROM T, UNNEST(queries)
WHERE param = 'foo';
这里重要的部分是T
和UNNEST(queries)
之间的,
,它取T
和 queries
中的元素.这等效于使用 JOIN
或 CROSS JOIN
代替逗号.
The important part here is the ,
between T
and the UNNEST(queries)
, which takes the cross product of the rows of T
and the elements in queries
. This is equivalent to using JOIN
or CROSS JOIN
in place of the comma as well.
查询还使用EXCEPT(查询)
来避免在查询结果中选择原始数组,因为我们只想要数组的扁平化"内容.
The query also uses EXCEPT (queries)
to avoid selecting the original array in the query result, since we only want the "flattened" contents of the array.
另一个示例查询,这次参数和值独立重复:
Another sample query, this time where the params and values repeat independently:
WITH T AS (
SELECT 1 AS row, ['foo', 'param'] AS query_params,
['bar', 'val'] AS query_values UNION ALL
SELECT 2, ['foo'], ['baz']
)
SELECT row, query_param, query_values[OFFSET(o)] AS query_value
FROM T, UNNEST(query_params) AS query_param WITH OFFSET o
WHERE query_param = 'foo';
这使用 query_params
中的偏移量并行索引到 query_values
.
This uses the offset within query_params
to index into query_values
in parallel.
这篇关于BigQuery:使用标准 SQL 过滤重复字段的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!