BigQuery 从查询中创建重复记录字段 [英] BigQuery creat repeated record field from query

查看:23
本文介绍了BigQuery 从查询中创建重复记录字段的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否可以在 BigQuery 中创建重复记录列?例如,对于以下数据:

Is it possible to create a repeated record column in BigQuery? For example, for the following data:

| a | b | c |
-------------
| 1 | 5 | 2 |
-------------
| 1 | 3 | 1 |
-------------
| 2 | 2 | 1 |

以下可能吗?

Select a, NEST(b, c) as d from *table* group by a

产生以下结果

| a | d.b | d.c |
-----------------
| 1 |  5  |  2  |
-----------------
|   |  3  |  1  |
-----------------
| 2 |  2  |  1  |

推荐答案

解决NEST()嵌套"只有一个字段的限制的一种方法是使用BigQuery 用户定义函数.它们非常强大,但仍有一些限制限制要注意.从我的角度来看,最重要的是要记住 - 他们非常适合被视为昂贵的高计算查询

One of the way to go around NEST() limitation of "nesting" just one field is to use BigQuery User-Defined Functions. They are extremely powerful yet still have some Limits and Limitations to be aware of. And most important from my prospective to have in mind - they are quite a candidates for being qualified as expensive High-Compute queries

复杂的查询会消耗非常大的计算资源相对于处理的字节数.通常,此类查询包含大量 JOIN 或 CROSS JOIN 子句或复杂的用户定义的函数.

Complex queries can consume extraordinarily large computing resources relative to the number of bytes processed. Typically, such queries contain a very large number of JOIN or CROSS JOIN clauses or complex User-defined Functions.

所以,下面是从 questino 的例子中模仿"NEST(b, c) 的例子:

So, below is example that "mimic" NEST(b, c) from example in questino:

SELECT a, d.b, d.c FROM 
JS((      // input table
  SELECT a, NEST(CONCAT(STRING(b), ',', STRING(c))) AS d
  FROM (
    SELECT * FROM 
    (SELECT 1 AS a, 5 AS b, 2 AS c),
    (SELECT 1 AS a, 3 AS b, 1 AS c),
    (SELECT 2 AS a, 2 AS b, 1 AS c)
  ) GROUP BY a),
  a, d,     // input columns
  "[{'name': 'a', 'type': 'INTEGER'},    // output schema
    {'name': 'd', 'type': 'RECORD',
     'mode': 'REPEATED',
     'fields': [
       {'name': 'b', 'type': 'STRING'},
       {'name': 'c', 'type': 'STRING'}
     ]    
    }
  ]",
  "function(row, emit){    // function 
    var c = [];
    for (var i = 0; i < row.d.length; i++) {
      x = row.d[i].toString().split(',');
      t = {b:x[0], c:x[1]}
      c.push(t);
    };
    emit({a: row.a, d: c});  
  }"
)

比较简单.我希望你能够通过它并得到一个想法

It is relatively straightforward. I hope you will be able to walk through it and get an idea

仍然 - 记住:

无论您如何使用嵌套/重复字段创建记录 - BigQuery自动展平查询结果,因此可见结果不会包含重复字段.所以你应该把它用作产生的子选择由同一查询立即使用的中间结果.

No matter how you create record with nested/repeated fields - BigQuery automatically flattens query results, so visible results won't contain repeated fields. So you should use it as a subselect that produces intermediate results for immediate use by the same query.

作为仅供参考,您可以通过运行下面的查询来证明上面只返回两条记录(而不是三条记录,因为它是扁平的)

As FYI, you can prove for yourself that above returns only two records (not three as it is looks like when it is flattened) by running below query

SELECT COUNT(1) AS rows FROM (
  <above query here>
) 

另一个重要注意事项:
众所周知,NEST()UnFlatten Results 输出不兼容,主要用于子查询中的中间结果.
相比之下,上面的解决方案可以很容易地直接保存到表格中(带有未选中的展平结果)

Another important NOTE:
This is a known that NEST() is not compatible with UnFlatten Results Output and mostly is used for intermediate result in subquery.
In contrast, above solution can be easily saved directly to table (with unchecked Flatten Results)

这篇关于BigQuery 从查询中创建重复记录字段的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆