BigQuery:使用标准SQL过滤重复的字段 [英] BigQuery : filter repeated fields with standard SQL

查看:141
本文介绍了BigQuery:使用标准SQL过滤重复的字段的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下表格:

  row | query_params | query_values 
1 foo bar
param val
2 foo baz

JSON:

  {
query_params:[foo,param],
query_values:[bar,val]
},{
query_params:[foo],
query_values:[baz]
}

使用标准SQL我想过滤重复字段的值,像

  SELECT * FROM table WHERE query_params ='foo'

哪一个会输出

  row | query_params | query_values 
1 foo bar
2 foo baz

PS:对于同一个问题使用旧版SQL,请参阅此处

解决方案

您是否看到过过滤重复字段的差异?使用您的样本数据作为基础,并假设参数和值一起重复(而不是单独的数组),您可以编写一个查询,例如:



<$ p $ (
SELECT 1 AS row,ARRAY< STRUCT< param STRING,value STRING>> [
('foo','bar'),(' (''''''']] AS查询UNION ALL
SELECT 2,ARRAY< STRUCT< param STRING,value STRING>> [('foo','baz')]

SELECT * EXCEPT(查询)
FROM T,UNNEST(查询)
WHERE param ='foo';

这里的重要部分是之间 T UNNEST(查询),它取< T的行的叉积和查询中的元素。这相当于使用 JOIN CROSS JOIN 代替逗号。



查询还使用 EXCEPT(查询)来避免在查询结果中选择原始数组,因为我们只需要展开



编辑:另一个示例查询,此时参数和值独立重复:

  WITH T AS(
SELECT 1 AS row,['foo','param'] AS query_params,
['bar','val'] AS query_values UNION ALL
SELECT 2,['foo'],['baz']

SELECT row,query_param,query_values [OFFSET(o)] AS query_value
FROM T,UNNEST query_params)AS query_param WITH OFFSET o
WHERE query_param ='foo';

这使用 query_params 中的偏移量来索引并行地转换为 query_values


I have the following table :

row | query_params | query_values
1     foo            bar  
      param          val
2     foo            baz 

JSON :

{ 
"query_params" : [ "foo", "param"], 
"query_values" : [ "bar", "val" ] 
}, { 
"query_params" : [ "foo" ], 
"query_values" : [ "baz" ] 
}

Using standard SQL I want to filter repeated field on their value, something like

SELECT * FROM table WHERE query_params = 'foo'

Which would output

row | query_params | query_values
1     foo            bar  
2     foo            baz       

PS : for the same question using legacy SQL, see here

解决方案

Have you seen the topic in the migration guide on differences in filtering repeated fields? Using your sample data as a basis, and assuming that the params and values repeat together (as opposed to being separate arrays), you can write a query such as:

WITH T AS (
  SELECT 1 AS row, ARRAY<STRUCT<param STRING, value STRING>>[
      ('foo', 'bar'), ('param', 'val')] AS queries UNION ALL
  SELECT 2, ARRAY<STRUCT<param STRING, value STRING>>[('foo', 'baz')]
)
SELECT * EXCEPT (queries)
FROM T, UNNEST(queries)
WHERE param = 'foo';

The important part here is the , between T and the UNNEST(queries), which takes the cross product of the rows of T and the elements in queries. This is equivalent to using JOIN or CROSS JOIN in place of the comma as well.

The query also uses EXCEPT (queries) to avoid selecting the original array in the query result, since we only want the "flattened" contents of the array.

Edit: Another sample query, this time where the params and values repeat independently:

WITH T AS (
  SELECT 1 AS row, ['foo', 'param'] AS query_params,
    ['bar', 'val'] AS query_values UNION ALL
  SELECT 2, ['foo'], ['baz']
)
SELECT row, query_param, query_values[OFFSET(o)] AS query_value
FROM T, UNNEST(query_params) AS query_param WITH OFFSET o
WHERE query_param = 'foo';

This uses the offset within query_params to index into query_values in parallel.

这篇关于BigQuery:使用标准SQL过滤重复的字段的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆