BigQuery GROUP_CONCAT和ORDER BY [英] BigQuery GROUP_CONCAT and ORDER BY

查看:241
本文介绍了BigQuery GROUP_CONCAT和ORDER BY的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在使用BigQuery和GROUP_CONCAT,它工作得很好。但是,当我尝试向GROUP_CONCAT语句添加ORDER BY子句时,就像我在SQL中所做的那样,我收到一个错误消息。



例如,像



SELECT a,GROUP_CONCAT(b ORDER BY c)
FROM test
GROUP BY a



如果我尝试指定分隔符,则会发生同样的情况。



有关如何处理此问题的任何想法?由于BigQuery不支持GROUP_CONCAT函数中的ORDER BY子句,因此可以通过使用分析窗口函数来实现此功能。而在GROUP_CONCAT的BigQuery分隔符中,只是该函数的第二个参数。
下面的例子说明了这点:

  select key,first(grouped_value)concat_value from(
select
key,
group_concat(value,':')超过
(按键分区
按值排序asc
无界前后无界之间的行)
grouped_value
from(
select key,value from
(select 1 as key,'b'as value),
(select 1 as key,'c'as value),
(选择1作为键,'a'作为数值),
(选择2作为键,'y'作为数值),
(选择2作为键,'x'作为数值)) )按键组

会产生以下内容:

 行键concat_value 
1 1 a:b:c
2 2 x:y
$ b $ p

关于Window规范的说明:查询使用无界前后无界之间的行,以确保分区内的所有行参与GROUP_CONCAT聚合。每个SQL标准的默认窗口规范是无界前和当前行之间的行,这对于像运行总和这样的事情是很好的,但是在这个问题中不会正确工作。



性能注意:尽管多次重新计算聚合函数看起来很浪费,但BigQuery优化器确实认识到,由于窗口不会改变,结果将是相同的,因此它仅为每个分区计算一次聚合。


I am currently using BigQuery and GROUP_CONCAT which works perfectly fine. However, when I try to add a ORDER BY clause to the GROUP_CONCAT statement like I would do in SQL, I receive an error.

So e.g., something like

SELECT a, GROUP_CONCAT(b ORDER BY c) FROM test GROUP BY a

The same happens if I try to specify the separator.

Any ideas on how to approach this?

解决方案

Since BigQuery doesn't support ORDER BY clause inside GROUP_CONCAT function, this functionality can be achieved by use of analytic window functions. And in BigQuery separator for GROUP_CONCAT is simply a second parameter for the function. Below example illustrates this:

select key, first(grouped_value) concat_value from (
select 
  key, 
  group_concat(value, ':') over 
    (partition by key
     order by value asc
     rows between unbounded preceding and unbounded following) 
  grouped_value 
from (
select key, value from
(select 1 as key, 'b' as value),
(select 1 as key, 'c' as value),
(select 1 as key, 'a' as value),
(select 2 as key, 'y' as value),
(select 2 as key, 'x' as value))) group by key

Will produce the following:

Row key concat_value     
1   1   a:b:c    
2   2   x:y

NOTE on Window specification: The query uses "rows between unbounded preceding and unbounded following" window specification, to make sure that all rows within a partition participate in GROUP_CONCAT aggregation. Per SQL Standard default window specification is "rows between unbounded preceding and current row" which is good for things like running sum, but won't work correctly in this problem.

Performance note: Even though it looks wasteful to recompute aggregation function multiple times, the BigQuery optimizer does recognize that since window is not changing result will be the same, so it only computes aggregation once per partition.

这篇关于BigQuery GROUP_CONCAT和ORDER BY的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆