条件超前/滞后函数PostgreSQL? [英] Conditional lead/lag function PostgreSQL?
问题描述
我有一个这样的表:
Name activity time
user1 A1 12:00
user1 E3 12:01
user1 A2 12:02
user2 A1 10:05
user2 A2 10:06
user2 A3 10:07
user2 M6 10:07
user2 B1 10:08
user3 A1 14:15
user3 B2 14:20
user3 D1 14:25
user3 D2 14:30
现在,我需要这样的结果:
Now, I need a result like this:
Name activity next_activity
user1 A2 NULL
user2 A3 B1
user3 A1 B2
我想为每个用户检查A组中的最后一项活动以及B组中接下来发生了什么类型的活动(B组中的活动始终发生在A组中的活动之后).其他类型的活动对我而言并不有趣.我试图使用lead()
函数,但是没有用.
I would like to check for every user the last activity from group A and what type of activity took place next from group B (activity from group B always takes place after activity from group A). Other types of activity are not interesting for me. I've tried to use the lead()
function, but it hasn't worked.
我该如何解决我的问题?
How I can solve my problem?
推荐答案
测试设置:
CREATE TEMP TABLE t (name text, activity text, time time);
INSERT INTO t values
('user1', 'A1', '12:00')
,('user1', 'E3', '12:01')
,('user1', 'A2', '12:02')
,('user2', 'A1', '10:05')
,('user2', 'A2', '10:06')
,('user2', 'A3', '10:07')
,('user2', 'M6', '10:07')
,('user2', 'B1', '10:08')
,('user3', 'A1', '14:15')
,('user3', 'B2', '14:20')
,('user3', 'D1', '14:25')
,('user3', 'D2', '14:30');
您的定义:
B组的活动总是在A组的活动之后发生.
activity from group B always takes place after activity from group A.
..从逻辑上讲,每个用户在1个或多个A活动之后有0个或1 B活动.依次进行的B活动不得超过1个.
.. logically implies that there is, per user, 0 or 1 B activity after 1 or more A activities. Never more than 1 B activities in sequence.
您可以使用单个窗口功能DISTINCT ON
和CASE
来使用它,这应该是每位用户少量行的最快方法(另请参见下文):
You can make it work with a single window function, DISTINCT ON
and CASE
, which should be the fastest way for few rows per user (also see below):
SELECT name
, CASE WHEN a2 LIKE 'B%' THEN a1 ELSE a2 END AS activity
, CASE WHEN a2 LIKE 'B%' THEN a2 END AS next_activity
FROM (
SELECT DISTINCT ON (name)
name
, lead(activity) OVER (PARTITION BY name ORDER BY time DESC) AS a1
, activity AS a2
FROM t
WHERE (activity LIKE 'A%' OR activity LIKE 'B%')
ORDER BY name, time DESC
) sub;
如果未添加ELSE
分支,则SQL CASE
表达式默认为NULL
,因此我将其简短了.
An SQL CASE
expression defaults to NULL
if no ELSE
branch is added, so I kept that short.
还假定time
被定义为NOT NULL
.否则,您可能要添加NULLS LAST
.为什么?
Also assuming time
is defined NOT NULL
. Else, you might want to add NULLS LAST
. Why?
(activity LIKE 'A%' OR activity LIKE 'B%')
比activity ~ '^[AB]'
更为冗长,但在较旧的Postgres版本中通常更快.关于模式匹配:
(activity LIKE 'A%' OR activity LIKE 'B%')
is more verbose than activity ~ '^[AB]'
, but typically faster in older versions of Postgres. About pattern matching:
实际上是可能.您可以将聚集的FILTER
子句与窗口函数的OVER
子句结合使用. 但是:
That's actually possible. You can combine the aggregate FILTER
clause with the OVER
clause of window functions. However:
-
FILTER
子句本身只能与当前行中的值一起使用.
The
FILTER
clause itself can only work with values from the current row.
更重要的是,尚未为Postgres 9.6中的lead()
或lag()
之类的纯窗口函数实现FILTER
(仅)-仅针对
More importantly, FILTER
is not implemented for pure window functions like lead()
or lag()
in Postgres 9.6 (yet) - only for aggregate functions.
如果您尝试:
lead(activity) FILTER (WHERE activity LIKE 'A%') OVER () AS activity
Postgres会告诉您:
Postgres will tell you:
FILTER is not implemented for non-aggregate window functions
关于FILTER
:
- How can I simplify this game statistics query?
- Referencing current row in FILTER clause of window function
(对于每个用户拥有 行的 少量 用户,几乎任何即使没有索引也可以快速查询.)
(For few users with few rows per user, pretty much any query is fast, even without index.)
对于 很多 个用户和每个用户 个行,上面的第一个查询应该最快.请参阅上面的链接的答案有关索引和性能.
For many users and few rows per user, the first query above should be fastest. See the linked answer above about index and performance.
对于每个用户 许多 行,根据设置的其他详细信息,有(可能很多)种更快的技术:
For many rows per user, there are (potentially much) faster techniques, depending on other details of your setup:
这篇关于条件超前/滞后函数PostgreSQL?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!