如何在BigQuery中将多行汇总为一行? [英] How to aggregate multiple rows into one in BigQuery?

查看:98
本文介绍了如何在BigQuery中将多行汇总为一行?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设您有一个非规范化的架构,其中包含多行,如下所示:

Suppose you have a de-normalized schema with multiple rows like below:

   uuid    |    property    |    value   
------------------------------------------
  abc      |   first_name   |  John
  abc      |   last_name    |  Connor
  abc      |   age          |  26
...

所有行都具有相同的属性集,不一定要排序. 如何使用BigQuery创建表(即没有客户端)

The same set of properties for all rows, not necessarily sorted. How to create a table such as using BigQuery (i.e. no client):

表user_properties:

Table user_properties:

   uuid    |    first_name  |    last_name   |    age
 --------------------------------------------------------
  abc      |   John         |    Connor      |    26

在传统的SQL中,为此目的使用了"STUFF"关键字.

In traditional SQL there is the "STUFF" keyword for this purpose.

如果我至少将结果按uuid排序会更容易,这样客户端就不需要加载整个表(4GB)进行排序-可以对每个表进行水合实体,方法是依次扫描具有相同uuid的行.但是,这样的查询:

It would be easier if I could at least get the results ORDERED by uuid so the client would not need to load the whole table (4GB) to sort -- it would be possible to hydrate each entity by scanning sequentially the rows with same uuid. However, a query like this:

SELECT * FROM user_properties ORDER BY uuid; 

超出了BigQuery中的可用资源(使用allowLargeResults禁止ORDER BY).除非我订阅高端计算机,否则似乎几乎无法在BigQuery中对大表(4GB)进行排序.有什么想法吗?

exceeds the available resources in BigQuery (using allowLargeResults forbids ORDER BY). It almost seems like I cannot sort a large table (4GB) in BigQuery unless I subscribe to a high end machine. Any ideas?

推荐答案

SELECT 
  uuid,
  MAX(IF(property = 'first_name', value, NULL)) AS first_name,
  MAX(IF(property = 'last_name', value, NULL)) AS last_name,
  MAX(IF(property = 'age', value, NULL)) AS age
FROM user_properties
GROUP BY uuid

另一个选择-不涉及GROUP

Another option - no GROUP'ing involved

SELECT uuid, first_name, last_name, age  
FROM (
  SELECT 
    uuid,
    LEAD(value, 1) OVER(PARTITION BY uuid ORDER BY property) AS first_name,
    LEAD(value, 2) OVER(PARTITION BY uuid ORDER BY property) AS last_name,
    value AS age,
    property = 'age' AS anchor
  FROM user_properties
)
HAVING anchor

这篇关于如何在BigQuery中将多行汇总为一行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆