PostgreSQL窗口功能：通过比较进行分区 [英] PostgreSQL window function: partition by comparison

查看：83 发布时间：2020/5/29 20:00:26 sql postgresql window-functions postgresql-performance

本文介绍了PostgreSQL窗口功能：通过比较进行分区的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我试图在PostgreSQL查询的WINDOW函数中找到与PARTITION BY子句中的当前行进行比较的方法。

I'm trying to find the way of doing a comparison with the current row in the PARTITION BY clause in a WINDOW function in PostgreSQL query.

想象一下我有以下查询的这5个元素的简短列表（实际上，我有成千上万的行）。我试图为每一行获取下一个不同元素（事件列）的ID，以及上一个不同元素的ID。

Imagine I have the short list in the following query of this 5 elements (in the real case, I have thousands or even millions of rows). I am trying to get for each row, the id of the next different element (event column), and the id of the previous different element.

WITH events AS(
  SELECT 1 as id, 12 as event, '2014-03-19 08:00:00'::timestamp as date
  UNION SELECT 2 as id, 12 as event, '2014-03-19 08:30:00'::timestamp as date
  UNION SELECT 3 as id, 13 as event, '2014-03-19 09:00:00'::timestamp as date
  UNION SELECT 4 as id, 13 as event, '2014-03-19 09:30:00'::timestamp as date
  UNION SELECT 5 as id, 12 as event, '2014-03-19 10:00:00'::timestamp as date
)
SELECT lag(id)  over w as previous_different, event
     , lead(id) over w as next_different
FROM events ev
WINDOW w AS (PARTITION BY event!=ev.event ORDER BY date ASC);

我知道比较 event！= ev.event 不正确，但这就是我要达到的目的。

I know the comparison event!=ev.event is incorrect but that's the point I want to reach.

我得到的结果是（与删除PARTITION BY子句相同）：

The result I get is (the same as if I delete the PARTITION BY clause):

 |12|2
1|12|3
2|13|4
3|13|5
4|12|

我想得到的结果是：

 |12|3
 |12|3
2|13|5
2|13|5
4|12|

任何人都知道这是否可行以及如何实现？非常感谢！

Anyone knows if it is possible and how? Thank you very much!

编辑：我知道我可以使用两个 JOIN s， ORDER BY 和 DISTINCT ON ，但在数百万行的实际情况下，效率非常低：


 I know I can do it with two JOINs, a ORDER BY and a DISTINCT ON, but in the real case of millions of rows it is very inefficient:
WITH events AS(
  SELECT 1 as id, 12 as event, '2014-03-19 08:00:00'::timestamp as date
  UNION SELECT 2 as id, 12 as event, '2014-03-19 08:30:00'::timestamp as date
  UNION SELECT 3 as id, 13 as event, '2014-03-19 09:00:00'::timestamp as date
  UNION SELECT 4 as id, 13 as event, '2014-03-19 09:30:00'::timestamp as date
  UNION SELECT 5 as id, 12 as event, '2014-03-19 10:00:00'::timestamp as date
)
SELECT DISTINCT ON (e.id, e.date) e1.id, e.event, e2.id
FROM events e
LEFT JOIN events e1 ON (e1.date<=e.date AND e1.id!=e.id AND e1.event!=e.event) 
LEFT JOIN events e2 ON (e2.date>=e.date AND e2.id!=e.id AND e2.event!=e.event) 
ORDER BY e.date ASC, e.id ASC, e1.date DESC, e1.id DESC, e2.date ASC, e2.id ASC

 
 
推荐答案
使用几个不同的窗口函数和两个子查询，这应该体面地快速工作：
Using several different window functions and two subqueries, this should work decently fast:
WITH events(id, event, ts) AS (
  VALUES
   (1, 12, '2014-03-19 08:00:00'::timestamp)
  ,(2, 12, '2014-03-19 08:30:00')
  ,(3, 13, '2014-03-19 09:00:00')
  ,(4, 13, '2014-03-19 09:30:00')
  ,(5, 12, '2014-03-19 10:00:00')
   )
SELECT first_value(pre_id)  OVER (PARTITION BY grp ORDER BY ts)      AS pre_id
     , id, ts
     , first_value(post_id) OVER (PARTITION BY grp ORDER BY ts DESC) AS post_id
FROM  (
   SELECT *, count(step) OVER w AS grp
   FROM  (
      SELECT id, ts
           , NULLIF(lag(event) OVER w, event) AS step
           , lag(id)  OVER w AS pre_id
           , lead(id) OVER w AS post_id
      FROM   events
      WINDOW w AS (ORDER BY ts)
      ) sub1
   WINDOW w AS (ORDER BY ts)
   ) sub2
ORDER  BY ts;

使用 ts 作为时间戳列的名称。
 
假定 ts 是唯一的-并且 已编入索引 （唯一约束会自动执行此操作）。 
Using ts as name for the timestamp column.

Assuming ts to be unique - and indexed (a unique constraint does that automatically).  
在具有5万行的真实表的测试中，只需要单次索引扫描。因此，即使有大桌子也应该相当快。相比之下，具有join / distinct的查询在一分钟后（如预期）没有完成。
 
即使是优化版本，一次也处理一次交叉联接（左联接几乎没有限制条件是
In a test with a real life table with 50k rows it only needed a single index scan. So, should be decently fast even with big tables. In comparison, your query with join / distinct did not finish after a minute (as expected).

Even an optimized version, dealing with one cross join at a time (the left join with hardly a limiting condition is effectively a limited cross join) did not finish after a minute.
要获得大表的最佳性能，请调整内存设置，尤其是   work_mem   （用于大型排序操作）。如果可以节省RAM，请考虑暂时为会话设置更高的值。在此处和此处。
For best performance with a big table, tune your memory settings, in particular for work_mem (for big sort operations). Consider setting it (much) higher for your session temporarily if you can spare the RAM. Read more here and here.
 
  在子查询 sub1 中，仅查看上一行的事件保留该值（如果已更改），从而标记新组的第一个元素。同时，获取上一行和下一行的 id （ pre_id ， post_id ）。

In subquery sub1 look at the event from the previous row and only keep that if it has changed, thus marking the first element of a new group. At the same time, get the id of the previous and the next row (pre_id, post_id).
在子查询 sub2 中， count（）仅计算非空值。产生的 grp 在连续的相同事件块中标记同伴。
In subquery sub2, count() only counts non-null values. The resulting grp marks peers in blocks of consecutive same events.
最后一个 SELECT ，每组每行取第一个 pre_id 和最后一个 post_id 以获得所需的结果。
 
实际上，在外部 SELECT 中应该更快：
In the final SELECT, take the first pre_id and the last post_id per group for each row to arrive at the desired result.

Actually, this should be even faster in the outer SELECT:
 last_value(post_id) OVER (PARTITION BY grp ORDER BY ts
                           RANGE BETWEEN UNBOUNDED PRECEDING
                                 AND     UNBOUNDED FOLLOWING) AS post_id

 ...因为窗口的排序顺序与 pre_id ，因此只需要一种排序。快速测试似乎可以确认这一点。 有关此框架定义的更多信息。 

... since the sort order of the window agrees with the window for pre_id, so only a single sort is needed. A quick test seems to confirm it. More about this frame definition.
   SQL小提琴。  

                        这篇关于PostgreSQL窗口功能：通过比较进行分区的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

PostgreSQL窗口功能：通过比较进行分区 [英] PostgreSQL window function: partition by comparison

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

PostgreSQL窗口功能：通过比较进行分区 [英] PostgreSQL window function: partition by comparison

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭