条件超前/滞后函数PostgreSQL? [英] Conditional lead/lag function PostgreSQL?

查看：269 发布时间：2020/11/20 20:05:57 sql postgresql greatest-n-per-group window-functions

本文介绍了条件超前/滞后函数PostgreSQL?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个这样的表:

Name   activity  time

user1  A1        12:00
user1  E3        12:01
user1  A2        12:02
user2  A1        10:05
user2  A2        10:06
user2  A3        10:07
user2  M6        10:07
user2  B1        10:08
user3  A1        14:15
user3  B2        14:20
user3  D1        14:25
user3  D2        14:30

现在，我需要这样的结果:

Now, I need a result like this:

Name   activity  next_activity

user1  A2        NULL
user2  A3        B1
user3  A1        B2

我想为每个用户检查A组中的最后一项活动以及B组中接下来发生了什么类型的活动(B组中的活动始终发生在A组中的活动之后).其他类型的活动对我而言并不有趣.我试图使用lead()函数，但是没有用.

I would like to check for every user the last activity from group A and what type of activity took place next from group B (activity from group B always takes place after activity from group A). Other types of activity are not interesting for me. I've tried to use the lead() function, but it hasn't worked.

我该如何解决我的问题?

How I can solve my problem?

推荐答案

测试设置:

CREATE TEMP TABLE t (name text, activity text, time time);
INSERT INTO t values
 ('user1', 'A1', '12:00')
,('user1', 'E3', '12:01')
,('user1', 'A2', '12:02')
,('user2', 'A1', '10:05')
,('user2', 'A2', '10:06')
,('user2', 'A3', '10:07')
,('user2', 'M6', '10:07')
,('user2', 'B1', '10:08')
,('user3', 'A1', '14:15')
,('user3', 'B2', '14:20')
,('user3', 'D1', '14:25')
,('user3', 'D2', '14:30');

您的定义:

B组的活动总是在A组的活动之后发生.

activity from group B always takes place after activity from group A.

..从逻辑上讲，每个用户在1个或多个A活动之后有0个或1 B活动.依次进行的B活动不得超过1个.

.. logically implies that there is, per user, 0 or 1 B activity after 1 or more A activities. Never more than 1 B activities in sequence.

您可以使用单个窗口功能DISTINCT ON和CASE来使用它，这应该是每位用户少量行的最快方法(另请参见下文):

You can make it work with a single window function, DISTINCT ON and CASE, which should be the fastest way for few rows per user (also see below):

SELECT name
     , CASE WHEN a2 LIKE 'B%' THEN a1 ELSE a2 END AS activity
     , CASE WHEN a2 LIKE 'B%' THEN a2 END AS next_activity
FROM  (
   SELECT DISTINCT ON (name)
          name
        , lead(activity) OVER (PARTITION BY name ORDER BY time DESC) AS a1
        , activity AS a2
   FROM   t
   WHERE (activity LIKE 'A%' OR activity LIKE 'B%')
   ORDER  BY name, time DESC
   ) sub;

如果未添加ELSE分支，则SQL CASE表达式默认为NULL，因此我将其简短了.

An SQL CASE expression defaults to NULL if no ELSE branch is added, so I kept that short.

还假定time被定义为NOT NULL.否则，您可能要添加NULLS LAST.为什么?

Also assuming time is defined NOT NULL. Else, you might want to add NULLS LAST. Why?

在每个GROUP BY组中选择第一行?

(activity LIKE 'A%' OR activity LIKE 'B%')比activity ~ '^[AB]'更为冗长，但在较旧的Postgres版本中通常更快.关于模式匹配:

(activity LIKE 'A%' OR activity LIKE 'B%') is more verbose than activity ~ '^[AB]', but typically faster in older versions of Postgres. About pattern matching:

模式与LIKE，SIMILAR TO或PostgreSQL中的正则表达式匹配

实际上是可能.您可以将聚集的FILTER子句与窗口函数的OVER子句结合使用. 但是:

That's actually possible. You can combine the aggregate FILTER clause with the OVER clause of window functions. However:

FILTER子句本身只能与当前行中的值一起使用.

The FILTER clause itself can only work with values from the current row.

更重要的是，尚未为Postgres 9.6中的lead()或lag()之类的纯窗口函数实现FILTER(仅)-仅针对

More importantly, FILTER is not implemented for pure window functions like lead() or lag() in Postgres 9.6 (yet) - only for aggregate functions.

如果您尝试:

lead(activity) FILTER (WHERE activity LIKE 'A%') OVER () AS activity

Postgres会告诉您:

Postgres will tell you:

FILTER is not implemented for non-aggregate window functions

关于FILTER:

How can I simplify this game statistics query?
Referencing current row in FILTER clause of window function

(对于每个用户拥有行的少量用户，几乎任何即使没有索引也可以快速查询.)

(For few users with few rows per user, pretty much any query is fast, even without index.)

对于很多个用户和每个用户个行，上面的第一个查询应该最快.请参阅上面的链接的答案有关索引和性能.

For many users and few rows per user, the first query above should be fastest. See the linked answer above about index and performance.

对于每个用户许多行，根据设置的其他详细信息，有(可能很多)种更快的技术:

For many rows per user, there are (potentially much) faster techniques, depending on other details of your setup:

将GROUP BY查询优化为检索每个用户的最新记录

这篇关于条件超前/滞后函数PostgreSQL?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

条件超前/滞后函数PostgreSQL? [英] Conditional lead/lag function PostgreSQL?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

条件超前/滞后函数PostgreSQL? [英] Conditional lead/lag function PostgreSQL?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭