BigQuery:使用标准 SQL 过滤重复字段 [英] BigQuery : filter repeated fields with standard SQL

查看:34
本文介绍了BigQuery:使用标准 SQL 过滤重复字段的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有下表:

row | query_params | query_values
1     foo            bar  
      param          val
2     foo            baz 

JSON:

{ 
"query_params" : [ "foo", "param"], 
"query_values" : [ "bar", "val" ] 
}, { 
"query_params" : [ "foo" ], 
"query_values" : [ "baz" ] 
}

使用标准 SQL 我想过滤重复字段的值,比如

Using standard SQL I want to filter repeated field on their value, something like

SELECT * FROM table WHERE query_params = 'foo'

哪个会输出

row | query_params | query_values
1     foo            bar  
2     foo            baz       

PS:对于使用旧版 SQL 的相同问题,请参阅此处

PS : for the same question using legacy SQL, see here

推荐答案

您是否在 过滤重复字段的差异?使用您的示例数据作为基础,并假设参数和值一起重复(而不是单独的数组),您可以编写如下查询:

Have you seen the topic in the migration guide on differences in filtering repeated fields? Using your sample data as a basis, and assuming that the params and values repeat together (as opposed to being separate arrays), you can write a query such as:

WITH T AS (
  SELECT 1 AS row, ARRAY<STRUCT<param STRING, value STRING>>[
      ('foo', 'bar'), ('param', 'val')] AS queries UNION ALL
  SELECT 2, ARRAY<STRUCT<param STRING, value STRING>>[('foo', 'baz')]
)
SELECT * EXCEPT (queries)
FROM T, UNNEST(queries)
WHERE param = 'foo';

这里重要的部分是TUNNEST(queries)之间的,,它取Tqueries 中的元素.这等效于使用 JOINCROSS JOIN 代替逗号.

The important part here is the , between T and the UNNEST(queries), which takes the cross product of the rows of T and the elements in queries. This is equivalent to using JOIN or CROSS JOIN in place of the comma as well.

查询还使用EXCEPT(查询)来避免在查询结果中选择原始数组,因为我们只想要数组的扁平化"内容.

The query also uses EXCEPT (queries) to avoid selecting the original array in the query result, since we only want the "flattened" contents of the array.

另一个示例查询,这次参数和值独立重复:

Another sample query, this time where the params and values repeat independently:

WITH T AS (
  SELECT 1 AS row, ['foo', 'param'] AS query_params,
    ['bar', 'val'] AS query_values UNION ALL
  SELECT 2, ['foo'], ['baz']
)
SELECT row, query_param, query_values[OFFSET(o)] AS query_value
FROM T, UNNEST(query_params) AS query_param WITH OFFSET o
WHERE query_param = 'foo';

这使用 query_params 中的偏移量并行索引到 query_values.

This uses the offset within query_params to index into query_values in parallel.

这篇关于BigQuery:使用标准 SQL 过滤重复字段的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆