BigQuery:使用标准SQL过滤重复的字段 [英] BigQuery : filter repeated fields with standard SQL
问题描述
我有以下表格:
row | query_params | query_values
1 foo bar
param val
2 foo baz
JSON:
{
query_params:[foo,param],
query_values:[bar,val]
},{
query_params:[foo],
query_values:[baz]
}
使用标准SQL我想过滤重复字段的值,像
SELECT * FROM table WHERE query_params ='foo'
哪一个会输出
row | query_params | query_values
1 foo bar
2 foo baz
PS:对于同一个问题使用旧版SQL,请参阅此处
您是否看到过过滤重复字段的差异?使用您的样本数据作为基础,并假设参数和值一起重复(而不是单独的数组),您可以编写一个查询,例如:
<$ p $ (
SELECT 1 AS row,ARRAY< STRUCT< param STRING,value STRING>> [
('foo','bar'),(' (''''''']] AS查询UNION ALL
SELECT 2,ARRAY< STRUCT< param STRING,value STRING>> [('foo','baz')]
)
SELECT * EXCEPT(查询)
FROM T,UNNEST(查询)
WHERE param ='foo';
这里的重要部分是,
之间 T
和 UNNEST(查询)
,它取<查询
中的元素。这相当于使用 JOIN
或 CROSS JOIN
代替逗号。
查询还使用 EXCEPT(查询)
来避免在查询结果中选择原始数组,因为我们只需要展开
编辑:另一个示例查询,此时参数和值独立重复:
WITH T AS(
SELECT 1 AS row,['foo','param'] AS query_params,
['bar','val'] AS query_values UNION ALL
SELECT 2,['foo'],['baz']
)
SELECT row,query_param,query_values [OFFSET(o)] AS query_value
FROM T,UNNEST query_params)AS query_param WITH OFFSET o
WHERE query_param ='foo';
这使用 query_params
中的偏移量来索引并行地转换为 query_values
。
I have the following table :
row | query_params | query_values
1 foo bar
param val
2 foo baz
JSON :
{
"query_params" : [ "foo", "param"],
"query_values" : [ "bar", "val" ]
}, {
"query_params" : [ "foo" ],
"query_values" : [ "baz" ]
}
Using standard SQL I want to filter repeated field on their value, something like
SELECT * FROM table WHERE query_params = 'foo'
Which would output
row | query_params | query_values
1 foo bar
2 foo baz
PS : for the same question using legacy SQL, see here
Have you seen the topic in the migration guide on differences in filtering repeated fields? Using your sample data as a basis, and assuming that the params and values repeat together (as opposed to being separate arrays), you can write a query such as:
WITH T AS (
SELECT 1 AS row, ARRAY<STRUCT<param STRING, value STRING>>[
('foo', 'bar'), ('param', 'val')] AS queries UNION ALL
SELECT 2, ARRAY<STRUCT<param STRING, value STRING>>[('foo', 'baz')]
)
SELECT * EXCEPT (queries)
FROM T, UNNEST(queries)
WHERE param = 'foo';
The important part here is the ,
between T
and the UNNEST(queries)
, which takes the cross product of the rows of T
and the elements in queries
. This is equivalent to using JOIN
or CROSS JOIN
in place of the comma as well.
The query also uses EXCEPT (queries)
to avoid selecting the original array in the query result, since we only want the "flattened" contents of the array.
Edit: Another sample query, this time where the params and values repeat independently:
WITH T AS (
SELECT 1 AS row, ['foo', 'param'] AS query_params,
['bar', 'val'] AS query_values UNION ALL
SELECT 2, ['foo'], ['baz']
)
SELECT row, query_param, query_values[OFFSET(o)] AS query_value
FROM T, UNNEST(query_params) AS query_param WITH OFFSET o
WHERE query_param = 'foo';
This uses the offset within query_params
to index into query_values
in parallel.
这篇关于BigQuery:使用标准SQL过滤重复的字段的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!