BigQuery GROUP_CONCAT 和 ORDER BY [英] BigQuery GROUP_CONCAT and ORDER BY

查看:32
本文介绍了BigQuery GROUP_CONCAT 和 ORDER BY的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在使用 BigQuery 和 GROUP_CONCAT,它们运行良好.但是,当我像在 SQL 中那样尝试将 ORDER BY 子句添加到 GROUP_CONCAT 语句时,我收到一个错误.

I am currently using BigQuery and GROUP_CONCAT which works perfectly fine. However, when I try to add a ORDER BY clause to the GROUP_CONCAT statement like I would do in SQL, I receive an error.

例如,类似的东西

SELECT a, GROUP_CONCAT(b ORDER BY c)从测试按一个分组

如果我尝试指定分隔符,也会发生同样的情况.

The same happens if I try to specify the separator.

关于如何解决这个问题的任何想法?

Any ideas on how to approach this?

推荐答案

由于 BigQuery 不支持 GROUP_CONCAT 函数内的 ORDER BY 子句,因此可以通过使用解析窗口函数来实现此功能.在 BigQuery 中,GROUP_CONCAT 的分隔符只是该函数的第二个参数.下面的例子说明了这一点:

Since BigQuery doesn't support ORDER BY clause inside GROUP_CONCAT function, this functionality can be achieved by use of analytic window functions. And in BigQuery separator for GROUP_CONCAT is simply a second parameter for the function. Below example illustrates this:

select key, first(grouped_value) concat_value from (
select 
  key, 
  group_concat(value, ':') over 
    (partition by key
     order by value asc
     rows between unbounded preceding and unbounded following) 
  grouped_value 
from (
select key, value from
(select 1 as key, 'b' as value),
(select 1 as key, 'c' as value),
(select 1 as key, 'a' as value),
(select 2 as key, 'y' as value),
(select 2 as key, 'x' as value))) group by key

将产生以下内容:

Row key concat_value     
1   1   a:b:c    
2   2   x:y

关于窗口规范的注意事项:查询使用无界前行和无界后续行之间的行"窗口规范,以确保分区内的所有行都参与 GROUP_CONCAT 聚合.每个 SQL 标准的默认窗口规范是无界前一行和当前行之间的行",这对运行总和之类的事情有好处,但在此问题中无法正常工作.

NOTE on Window specification: The query uses "rows between unbounded preceding and unbounded following" window specification, to make sure that all rows within a partition participate in GROUP_CONCAT aggregation. Per SQL Standard default window specification is "rows between unbounded preceding and current row" which is good for things like running sum, but won't work correctly in this problem.

性能说明:尽管多次重新计算聚合函数看起来很浪费,但 BigQuery 优化器确实认识到,由于窗口没有改变,结果将是相同的,因此它每个分区只计算一次聚合.

Performance note: Even though it looks wasteful to recompute aggregation function multiple times, the BigQuery optimizer does recognize that since window is not changing result will be the same, so it only computes aggregation once per partition.

这篇关于BigQuery GROUP_CONCAT 和 ORDER BY的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆