BigQuery GROUP_CONCAT 和 ORDER BY [英] BigQuery GROUP_CONCAT and ORDER BY
问题描述
我目前正在使用 BigQuery 和 GROUP_CONCAT,它们运行良好.但是,当我像在 SQL 中那样尝试将 ORDER BY 子句添加到 GROUP_CONCAT 语句时,我收到一个错误.
I am currently using BigQuery and GROUP_CONCAT which works perfectly fine. However, when I try to add a ORDER BY clause to the GROUP_CONCAT statement like I would do in SQL, I receive an error.
例如,类似的东西
SELECT a, GROUP_CONCAT(b ORDER BY c)从测试按一个分组
如果我尝试指定分隔符,也会发生同样的情况.
The same happens if I try to specify the separator.
关于如何解决这个问题的任何想法?
Any ideas on how to approach this?
推荐答案
由于 BigQuery 不支持 GROUP_CONCAT 函数内的 ORDER BY 子句,因此可以通过使用解析窗口函数来实现此功能.在 BigQuery 中,GROUP_CONCAT 的分隔符只是该函数的第二个参数.下面的例子说明了这一点:
Since BigQuery doesn't support ORDER BY clause inside GROUP_CONCAT function, this functionality can be achieved by use of analytic window functions. And in BigQuery separator for GROUP_CONCAT is simply a second parameter for the function. Below example illustrates this:
select key, first(grouped_value) concat_value from (
select
key,
group_concat(value, ':') over
(partition by key
order by value asc
rows between unbounded preceding and unbounded following)
grouped_value
from (
select key, value from
(select 1 as key, 'b' as value),
(select 1 as key, 'c' as value),
(select 1 as key, 'a' as value),
(select 2 as key, 'y' as value),
(select 2 as key, 'x' as value))) group by key
将产生以下内容:
Row key concat_value
1 1 a:b:c
2 2 x:y
关于窗口规范的注意事项:查询使用无界前行和无界后续行之间的行"窗口规范,以确保分区内的所有行都参与 GROUP_CONCAT 聚合.每个 SQL 标准的默认窗口规范是无界前一行和当前行之间的行",这对运行总和之类的事情有好处,但在此问题中无法正常工作.
NOTE on Window specification: The query uses "rows between unbounded preceding and unbounded following" window specification, to make sure that all rows within a partition participate in GROUP_CONCAT aggregation. Per SQL Standard default window specification is "rows between unbounded preceding and current row" which is good for things like running sum, but won't work correctly in this problem.
性能说明:尽管多次重新计算聚合函数看起来很浪费,但 BigQuery 优化器确实认识到,由于窗口没有改变,结果将是相同的,因此它每个分区只计算一次聚合.
Performance note: Even though it looks wasteful to recompute aggregation function multiple times, the BigQuery optimizer does recognize that since window is not changing result will be the same, so it only computes aggregation once per partition.
这篇关于BigQuery GROUP_CONCAT 和 ORDER BY的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!