如何在窗口函数中使用环形数据结构 [英] How to use a ring data structure in window functions

查看:110
本文介绍了如何在窗口函数中使用环形数据结构的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以环形结构(或非常理想。但是,它们不能嵌套:

  SELECT环,部分,数组[
lag(part,1,last_value(部分)over(按分区划分))over(按分区划分),
部分,
Lead(part,1,1)over(按分区划分)
] AS邻居
FROM rp;
错误:无法嵌套窗口函数调用
第2行:滞后(部分,1,last_value(部分)超过(按环划分))...






更新。感谢@Justin的建议,使用 coalesce 避免嵌套窗口函数。此外,许多人指出,在环序列上,第一个/最后一个值需要明确的 by 顺序,恰好是 part 在此示例中。因此,对输入数据进行一些随机化处理:

 创建表rp(ring int,part int); 
插入rp(ring,part)中,选择1,generate_series(1,5)的顺序由random();
插入rp(ring,part)选择2,由random()生成generate_series(1,7)顺序;


解决方案


  • 使用 COALESCE ,例如 @Justin提供

  • 使用 first_value() / last_value(),您需要添加 <窗口定义的code> ORDER BY 子句或顺序为 undefined 。在该示例中,您只是很幸运,因为在创建虚拟表后,行恰好排好了。

    添加 ORDER BY 后,默认窗口框架在当前行处结束,并且需要特殊情况调用 last_value()调用-或还原窗口框架中的排序顺序像我的第一个示例中所示。


  • 多次重复使用窗口定义时,显式 WINDOW 子句大大简化了语法:






  SELECT环,部分,数组[
合并(
滞后(部分)OVER w
,first_value(部分)OVER (按环顺序划分,按DESC部分排序))
,部分
,coalesce(
铅(部分)超过w
,first_value(部分)超过w)
] AS邻居
从rp
窗口w AS(按部分按戒指顺序订购);

更好,重复使用相同的窗口定义,因此Postgres可以计算所有一次扫描中的值。为此,我们需要定义一个自定义窗口框架

  SELECT环,零件,数组[
合并(
滞后(部分)超过w
,last_value(部分)超过w)
,部分
,coalesce(
铅(部分)超过w
,first_value(part)超过w)
] AS邻居
从rp
窗口w AS(按环
排序按
排序)在未绑定前
和未绑定前
之间按1,2排序;

您甚至可以为每个窗口函数调用调整框架定义:

  SELECT环,部分,数组[
合并(
滞后(部分)OVER w
,last_value(部分)OVER( w在当前行
和无边界以下之间的范围))
,部分
,coalesce(
引线(部分)超过w
,first_value(部分)超过w)
] AS邻居
从rp
WINDOW w AS(按部分按戒指顺序排列)
按1,2;对于很多零件的戒指,

可能会更快。您必须进行测试。



SQL Fiddle 通过改进的测试用例演示了这三个方面。考虑查询计划。



有关窗口框架定义的更多信息:




I have data that is arranged in a ring structure (or circular buffer), that is it can be expressed as sequences that cycle: ...-1-2-3-4-5-1-2-3-.... See this picture to get an idea of a 5-part ring:

I'd like to create a window query that can combine the lag and lead items into a three point array, but I can't figure it out. For example at part 1 of a 5-part ring, the lag/lead sequence is 5-1-2, or at part 4 is 3-4-5.

Here is an example table of two rings with different numbers of parts (always more than three per ring):

create table rp (ring int, part int);
insert into rp(ring, part) values(1, generate_series(1, 5));
insert into rp(ring, part) values(2, generate_series(1, 7));

Here is a nearly successful query:

SELECT ring, part, array[
    lag(part, 1, NULL) over (partition by ring),
    part,
    lead(part, 1, 1) over (partition by ring)
    ] AS neighbours
FROM rp;

 ring | part | neighbours
------+------+------------
    1 |    1 | {NULL,1,2}
    1 |    2 | {1,2,3}
    1 |    3 | {2,3,4}
    1 |    4 | {3,4,5}
    1 |    5 | {4,5,1}
    2 |    1 | {NULL,1,2}
    2 |    2 | {1,2,3}
    2 |    3 | {2,3,4}
    2 |    4 | {3,4,5}
    2 |    5 | {4,5,6}
    2 |    6 | {5,6,7}
    2 |    7 | {6,7,1}
(12 rows)

The only thing I need to do is to replace the NULL with the ending point of each ring, which is the last value. Now, along with lag and lead window functions, there is a last_value function which would be ideal. However, these cannot be nested:

SELECT ring, part, array[
    lag(part, 1, last_value(part) over (partition by ring)) over (partition by ring),
    part,
    lead(part, 1, 1) over (partition by ring)
    ] AS neighbours
FROM rp;
ERROR:  window function calls cannot be nested
LINE 2:     lag(part, 1, last_value(part) over (partition by ring)) ...


Update. Thanks to @Justin's suggestion to use coalesce to avoid nesting window functions. Furthermore, it has been pointed out by numerous folks that first/last values need an explicit order by on the ring sequence, which happens to be part for this example. So randomising the input data a bit:

create table rp (ring int, part int);
insert into rp(ring, part) select 1, generate_series(1, 5) order by random();
insert into rp(ring, part) select 2, generate_series(1, 7) order by random();

解决方案

  • Use COALESCE like @Justin provided.
  • With first_value() / last_value() you need to add an ORDER BY clause to the window definition or the order is undefined. You just got lucky in the example, because the rows happen to be in order right after creating the dummy table.
    Once you add ORDER BY, the default window frame ends at the current row, and you need to special case the last_value() call - or revert the sort order in the window frame like demonstrated in my first example.

  • When reusing a window definition multiple times, an explicit WINDOW clause simplifies syntax a lot:

SELECT ring, part, ARRAY[
          coalesce(
             lag(part) OVER w
            ,first_value(part) OVER (PARTITION BY ring ORDER BY part DESC))
         ,part
         ,coalesce(
             lead(part) OVER w
            ,first_value(part) OVER w)
         ] AS neighbours
FROM   rp
WINDOW w AS (PARTITION BY ring ORDER BY part);

Better yet, reuse the same window definition, so Postgres can calculate all values in a single scan. For this to work we need to define a custom window frame:

SELECT ring, part, ARRAY[
          coalesce(
             lag(part) OVER w
            ,last_value(part) OVER w)
         ,part
         ,coalesce(
             lead(part) OVER w
            ,first_value(part) OVER w)
         ] AS neighbours
FROM   rp
WINDOW w AS (PARTITION BY ring
             ORDER BY part
             RANGE BETWEEN UNBOUNDED PRECEDING
                       AND UNBOUNDED FOLLOWING)
ORDER  BY 1,2;

You can even adapt the frame definition for each window function call:

SELECT ring, part, ARRAY[
          coalesce(
             lag(part) OVER w
            ,last_value(part) OVER (w RANGE BETWEEN CURRENT ROW
                                                AND UNBOUNDED FOLLOWING))
         ,part
         ,coalesce(
             lead(part) OVER w
            ,first_value(part) OVER w)
         ] AS neighbours
FROM   rp
WINDOW w AS (PARTITION BY ring ORDER BY part)
ORDER  BY 1,2;

Might be faster for rings with many parts. You'll have to test.

SQL Fiddle demonstrating all three with an improved test case. Consider query plans.

More about window frame definitions:

这篇关于如何在窗口函数中使用环形数据结构的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆