为两个字段创建两个数组,保持数组的排序顺序同步(无子查询) [英] Create two arrays for two fields, keeping sort order of arrays in sync (without subquery)

查看:127
本文介绍了为两个字段创建两个数组,保持数组的排序顺序同步(无子查询)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

除了我很好奇人们会如何做这个问题之外,没有其他问题的押韵或理由。

There is no rhyme or reason for this question other than I was curious about how one would go about doing this.

平台::当我希望使用SQL标准解决方案时,我主要关注的是 PostgreSQL 8.4 + 。 (我知道9.0+具有一些数组排序功能。)

Platform: while I was hoping for a SQL-Standard solution, my main concentration is with PostgreSQL 8.4+. (I know 9.0+ has some array sorting functions.)

SELECT    id, group, dt
FROM      foo
ORDER BY  id;




  id   | group |    dt
-------+-------+-----------
   1   |  foo  | 2012-01-01
   1   |  bar  | 2012-01-03
   1   |  baz  | 2012-01-02
   2   |  foo  | 2012-01-01
   3   |  bar  | 2012-01-01
   4   |  bar  | 2012-01-01
   4   |  baz  | 2012-01-01


我知道以下查询是错误的,但结果与我追求的结果相似;绑定两个字段的一种方法(对 group 进行排序还应该对 dt 进行排序):

I know the following query is wrong, but the result is similar to what I'm after; a way to tie the two fields (sorting of group should also sort dt):

SELECT    id, sort_array(array_agg(group)), array_agg(dt)
FROM      foo
GROUP BY  id;




  id   |     group      |                dt
-------+----------------+------------------------------------
   1   |  {bar,baz,foo} | {2012-01-03,2012-01-02,2012-01-01}
   2   |  {foo}         | {2012-01-01}
   3   |  {bar}         | {2012-01-01}
   4   |  {bar,baz}     | {2012-01-01,2012-01-01}


有没有简单的方法来绑定字段以进行排序,而无需使用子查询?也许构建一个数组数组然后嵌套?

Is there an easy way to tie the fields for sorting, w/o using a subquery? Perhaps build an array of arrays and then unnest?

推荐答案

我更改了列名 group grp ,因为 group 保留的单词在Postgres和每个SQL标准中,都不应该用作标识符。

I changed your column name group to grp because group is a reserved word in Postgres and every SQL standard and shouldn't be used as identifier.

我理解您的问题是这样的:

I understand your question like this:

获取以相同排序顺序排序的两个数组,以便相同的元素位置对应于

使用子查询 CTE 并对行进行排序

SELECT id, array_agg(grp) AS grp, array_agg(dt) AS dt
FROM  (
    SELECT *
    FROM   tbl
    ORDER  BY id, grp, dt
    ) x
GROUP  BY id;

比使用单个 ORDER BY 子句 array_agg()就像 @Mosty演示(而且在那里(自PostgreSQL 9.0起)。 Mosty也会以不同的方式解释您的问题,并使用适当的工具进行解释。

That's faster than to use individual ORDER BY clauses in the aggregate function array_agg() like @Mosty demonstrates (and which has been there since PostgreSQL 9.0). Mosty also interprets your question differently and uses the proper tools for his interpretation.

手册:


集合函数 array_agg json_agg ,以及
类似的用户定义的聚合函数,根据输入值的顺序有意义地产生
不同的结果值。
默认情况下未指定此顺序,但可以由
来控制,该命令在聚合调用中编写 ORDER BY 子句,如
所示 4.2.7 部分。另外,通常也可以从
排序的子查询中提供输入值。例如:

The aggregate functions array_agg, json_agg, [...] as well as similar user-defined aggregate functions, produce meaningfully different result values depending on the order of the input values. This ordering is unspecified by default, but can be controlled by writing an ORDER BY clause within the aggregate call, as shown in Section 4.2.7. Alternatively, supplying the input values from a sorted subquery will usually work. For example:

SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab;

请注意,如果外部查询级别包含
个额外处理,则该方法可能会失败联接,因为这可能会导致
子查询的输出在计算聚合之前重新排序。

Beware that this approach can fail if the outer query level contains additional processing, such as a join, because that might cause the subquery's output to be reordered before the aggregate is computed.

所以,

如果您真的需要一个解决方案不带子查询,您可以:

If you really need a solution without subquery, you can:

SELECT id
     , array_agg(grp ORDER BY grp)
     , array_agg(dt  ORDER BY grp, dt)
FROM   tbl
GROUP  BY id;

请注意 ORDER BY grp,dt 。除了打破平局,我还按 dt 进行排序,并使排序顺序明确。但是,对于 grp 并不是必需的。

Note the ORDER BY grp, dt. I sort by dt in addition to break ties and make the sort order unambiguous. Not necessary for grp, though.

还有一种完全不同的方法,使用窗口功能

There is also a completely different way to do this, with window functions:

SELECT DISTINCT ON (id)
       id
     , array_agg(grp) OVER w AS grp
     , array_agg(dt)  OVER w AS dt
FROM   tbl
WINDOW w AS (PARTITION BY id ORDER BY grp, dt
             ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
ORDER  BY id;

请注意 DISTINCT ON(id)只是 DISTINCT 会产生相同的结果,但执行速度快一个数量级,因为我们不需要额外的排序。

Note the DISTINCT ON (id) instead of just DISTINCT which produces the same result but performs faster by an order of magnitude because we do not need an extra sort.

我进行了一些测试,这几乎与其他两种解决方案一样快。不出所料,子查询版本仍然是最快的。用 EXPLAIN ANALYZE 进行测试,亲自看看。

I ran some tests and this is almost as fast as the other two solutions. As expected, the subquery version was still fastest. Test with EXPLAIN ANALYZE to see for yourself.

这篇关于为两个字段创建两个数组,保持数组的排序顺序同步(无子查询)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆