如何在窗口函数中使用环形数据结构 [英] How to use a ring data structure in window functions
问题描述
我有以环形结构(或非常理想。但是,它们不能嵌套:
SELECT环,部分,数组[
lag(part,1,last_value(部分)over(按分区划分))over(按分区划分),
部分,
Lead(part,1,1)over(按分区划分)
] AS邻居
FROM rp;
错误:无法嵌套窗口函数调用
第2行:滞后(部分,1,last_value(部分)超过(按环划分))...
更新。感谢@Justin的建议,使用 coalesce
避免嵌套窗口函数。此外,许多人指出,在环序列上,第一个/最后一个值需要明确的 by
顺序,恰好是 part
在此示例中。因此,对输入数据进行一些随机化处理:
创建表rp(ring int,part int);
插入rp(ring,part)中,选择1,generate_series(1,5)的顺序由random();
插入rp(ring,part)选择2,由random()生成generate_series(1,7)顺序;
- 使用
COALESCE
,例如 @Justin提供。 -
使用
first_value()
/last_value()
,您需要添加 <窗口定义的code> ORDER BY 子句或顺序为 undefined 。在该示例中,您只是很幸运,因为在创建虚拟表后,行恰好排好了。
添加ORDER BY
后,默认窗口框架在当前行处结束,并且需要特殊情况调用last_value()
调用-或还原窗口框架中的排序顺序像我的第一个示例中所示。 -
多次重复使用窗口定义时,显式
WINDOW
子句大大简化了语法:
SELECT环,部分,数组[
合并(
滞后(部分)OVER w
,first_value(部分)OVER (按环顺序划分,按DESC部分排序))
,部分
,coalesce(
铅(部分)超过w
,first_value(部分)超过w)
] AS邻居
从rp
窗口w AS(按部分按戒指顺序订购);
更好,重复使用相同的窗口定义,因此Postgres可以计算所有一次扫描中的值。为此,我们需要定义一个自定义窗口框架:
SELECT环,零件,数组[
合并(
滞后(部分)超过w
,last_value(部分)超过w)
,部分
,coalesce(
铅(部分)超过w
,first_value(part)超过w)
] AS邻居
从rp
窗口w AS(按环
排序按
排序)在未绑定前
和未绑定前
之间按1,2排序;
您甚至可以为每个窗口函数调用调整框架定义:
SELECT环,部分,数组[
合并(
滞后(部分)OVER w
,last_value(部分)OVER( w在当前行
和无边界以下之间的范围))
,部分
,coalesce(
引线(部分)超过w
,first_value(部分)超过w)
] AS邻居
从rp
WINDOW w AS(按部分按戒指顺序排列)
按1,2;对于很多零件的戒指,
可能会更快。您必须进行测试。
SQL Fiddle 通过改进的测试用例演示了这三个方面。考虑查询计划。
有关窗口框架定义的更多信息:
I have data that is arranged in a ring structure (or circular buffer), that is it can be expressed as sequences that cycle: ...-1-2-3-4-5-1-2-3-.... See this picture to get an idea of a 5-part ring:
I'd like to create a window query that can combine the lag and lead items into a three point array, but I can't figure it out. For example at part 1 of a 5-part ring, the lag/lead sequence is 5-1-2, or at part 4 is 3-4-5.
Here is an example table of two rings with different numbers of parts (always more than three per ring):
create table rp (ring int, part int);
insert into rp(ring, part) values(1, generate_series(1, 5));
insert into rp(ring, part) values(2, generate_series(1, 7));
Here is a nearly successful query:
SELECT ring, part, array[
lag(part, 1, NULL) over (partition by ring),
part,
lead(part, 1, 1) over (partition by ring)
] AS neighbours
FROM rp;
ring | part | neighbours
------+------+------------
1 | 1 | {NULL,1,2}
1 | 2 | {1,2,3}
1 | 3 | {2,3,4}
1 | 4 | {3,4,5}
1 | 5 | {4,5,1}
2 | 1 | {NULL,1,2}
2 | 2 | {1,2,3}
2 | 3 | {2,3,4}
2 | 4 | {3,4,5}
2 | 5 | {4,5,6}
2 | 6 | {5,6,7}
2 | 7 | {6,7,1}
(12 rows)
The only thing I need to do is to replace the NULL
with the ending point of each ring, which is the last value. Now, along with lag
and lead
window functions, there is a last_value
function which would be ideal. However, these cannot be nested:
SELECT ring, part, array[
lag(part, 1, last_value(part) over (partition by ring)) over (partition by ring),
part,
lead(part, 1, 1) over (partition by ring)
] AS neighbours
FROM rp;
ERROR: window function calls cannot be nested
LINE 2: lag(part, 1, last_value(part) over (partition by ring)) ...
Update. Thanks to @Justin's suggestion to use coalesce
to avoid nesting window functions. Furthermore, it has been pointed out by numerous folks that first/last values need an explicit order by
on the ring sequence, which happens to be part
for this example. So randomising the input data a bit:
create table rp (ring int, part int);
insert into rp(ring, part) select 1, generate_series(1, 5) order by random();
insert into rp(ring, part) select 2, generate_series(1, 7) order by random();
- Use
COALESCE
like @Justin provided. With
first_value()
/last_value()
you need to add anORDER BY
clause to the window definition or the order is undefined. You just got lucky in the example, because the rows happen to be in order right after creating the dummy table.
Once you addORDER BY
, the default window frame ends at the current row, and you need to special case thelast_value()
call - or revert the sort order in the window frame like demonstrated in my first example.When reusing a window definition multiple times, an explicit
WINDOW
clause simplifies syntax a lot:
SELECT ring, part, ARRAY[
coalesce(
lag(part) OVER w
,first_value(part) OVER (PARTITION BY ring ORDER BY part DESC))
,part
,coalesce(
lead(part) OVER w
,first_value(part) OVER w)
] AS neighbours
FROM rp
WINDOW w AS (PARTITION BY ring ORDER BY part);
Better yet, reuse the same window definition, so Postgres can calculate all values in a single scan. For this to work we need to define a custom window frame:
SELECT ring, part, ARRAY[
coalesce(
lag(part) OVER w
,last_value(part) OVER w)
,part
,coalesce(
lead(part) OVER w
,first_value(part) OVER w)
] AS neighbours
FROM rp
WINDOW w AS (PARTITION BY ring
ORDER BY part
RANGE BETWEEN UNBOUNDED PRECEDING
AND UNBOUNDED FOLLOWING)
ORDER BY 1,2;
You can even adapt the frame definition for each window function call:
SELECT ring, part, ARRAY[
coalesce(
lag(part) OVER w
,last_value(part) OVER (w RANGE BETWEEN CURRENT ROW
AND UNBOUNDED FOLLOWING))
,part
,coalesce(
lead(part) OVER w
,first_value(part) OVER w)
] AS neighbours
FROM rp
WINDOW w AS (PARTITION BY ring ORDER BY part)
ORDER BY 1,2;
Might be faster for rings with many parts. You'll have to test.
SQL Fiddle demonstrating all three with an improved test case. Consider query plans.
More about window frame definitions:
- In the manual.
- PostgreSQL window function: partition by comparison
- PostgreSQL query with max and min date plus associated id per row
这篇关于如何在窗口函数中使用环形数据结构的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!