将SQL差距跳过特定条件&正确的lead()使用 [英] Jump SQL gap over specific condition & proper lead() usage
问题描述
(PostgreSQL 8.4)继续我的上一个示例,我希望进一步了解我对使用Window函数进行间隙和孤岛处理的理解。请考虑以下表和数据:
(PostgreSQL 8.4) Continuing with my previous example, I wish to further my understanding of gaps-and-islands processing with Window-functions. Consider the following table and data:
CREATE TABLE T1
(
id SERIAL PRIMARY KEY,
val INT, -- some device
status INT -- 0=OFF, 1=ON
);
INSERT INTO T1 (val, status) VALUES (10, 0);
INSERT INTO T1 (val, status) VALUES (11, 0);
INSERT INTO T1 (val, status) VALUES (11, 1);
INSERT INTO T1 (val, status) VALUES (10, 1);
INSERT INTO T1 (val, status) VALUES (11, 0);
INSERT INTO T1 (val, status) VALUES (10, 0);
如前所述,设备会打开和关闭,这一次我希望提取一个特定的序列:
As previously explained, the devices turn ON and OFF and this time I wish to extract a specific sequence:
- 显示所有不重复的新
ON
状态记录(同一设备两次 - 从当前
ON
设备显示适当的OFF
状态
- show all new
ON
status records that aren't duplicate (same device twice in a row) - show appropriate
OFF
status from currentlyON
device
我能得到的最接近的是:
The closest I could get is this:
SELECT * FROM (
SELECT *
,lag(val, 1, 0) OVER (PARTITION BY status ORDER BY id) last_val
,lag(status, 1, -1) OVER (PARTITION BY val ORDER BY id) last_status
FROM t1
) x
WHERE (last_val <> val OR last_status <> status)
AND (status = 1 OR last_status <> -1)
ORDER BY id
样本不包含的更多伪造数据,但本质上是关于取出后续重复项(无论状态如何)和顶部 OFF
记录不匹配。记录 3
, 4
, 5
和 6
返回,但是我不希望第五个,这是 OFF
的关闭,它是在新的 ON之后出现的
。因此,我需要跨越这个空白,为当前处于活动状态的设备寻找下一个正确的 OFF
。
This filters out more bogus data that the sample doesn't include but essentially it's about taking out subsequent duplicates (regardless of status) and the top OFF
records which don't match. Records 3
, 4
, 5
and 6
are returned, but I don't want the fifth, it's an OFF
which came after a new ON
. So I need to jump that gap and look for the next proper OFF
for the currently active device.
- 10关闭-在此情况下为假,但弄乱了lag()
- 11关闭-在此情况下,为假,但弄乱了lag()
- 11打开-确定,新序列,包含在SELECT
- 10打开-确定,新序列,包含在SELECT
- 11关闭-消息来晚了,需要忽略间隙
- 10关闭-可以,请正确关闭第4行,并包括在SELECT
- 10 turns OFF -- bogus in this context, but messes up the lag()
- 11 turns OFF -- bogus in this context, but messes up the lag()
- 11 turns ON -- OK, new sequence, include in SELECT
- 10 turns ON -- OK, new sequence, include in SELECT
- 11 turns OFF -- message came in late, need to ignore gap
- 10 turns OFF -- OK, proper OFF to row 4, need to include in SELECT
中,一旦正确过滤,我想使用 lead()
最重要的是获取下一行的ID(想象一个时间戳),并过滤掉所有 ON
状态的记录。我想这将需要三个嵌入式SELECT语句。这将使我清楚地了解设备处于活动状态的时间,直到再次 ON
或适当关闭 OFF $ c的状态$ c>。
Once filtered properly, I would like to use lead()
on top of it all to get the next row's id (imagine a timestamp) as well as filter out all records that aren't ON
statuses. I imagine this will require three embedded SELECT statements. This would get me a clear understanding of how long a device was active, until the condition of either another ON
or a proper turn OFF
.
推荐答案
带有窗口函数的查询
Query with window functions
SELECT *
FROM (
SELECT *
,lag(val, 1, 0) OVER (PARTITION BY status ORDER BY id) AS last_val
,lag(status, 1, 0) OVER w2 AS last_status
,lag(next_id) OVER w2 AS next_id_of_last_status
FROM (
SELECT *, lead(id) OVER (PARTITION BY status ORDER BY id) AS next_id
FROM t1
) AS t
WINDOW w2 AS (PARTITION BY val ORDER BY id)
) x
WHERE (last_val <> val OR last_status <> status)
AND (status = 1
OR last_status = 1
AND ((next_id_of_last_status > id) OR next_id_of_last_status IS NULL)
)
ORDER BY id
除了
In addition to what we already had, we need valid OFF switches.
An OFF
开关是否有效,如果在之前设备 ON
被打开( last_status = 1
),然后进行下一个 ON
操作,然后再进行相关的 OFF
开关( next_id_of_last_status> id
)。
An OFF
switch if valid if the device was switched ON
before (last_status = 1
) and the next ON
operation after that comes after the OFF
switch in question (next_id_of_last_status > id
).
我们必须提供特殊情况,即最后一个 ON
操作,因此我们还要检查 NULL
(或next_id_of_last_status IS NULL
)。
We have to provide for the special case that there is was the last ON
operation, so we check for NULL
in addition (OR next_id_of_last_status IS NULL
).
next_id_of_last_status
来自我们从中获取 last_status
的同一窗口。因此,我为显式窗口声明引入了其他语法,因此不必重复我自己:
The next_id_of_last_status
comes from the same window that we take last_status
from. Therefore I introduced additional syntax for explicit window declaration, so I don't have to repeat myself:
WINDOW w2 AS (PARTITION BY val ORDER BY id)
我们需要更早地获得子查询中最后一个状态的下一个ID (子查询 t
)。
And we need to get the next id for the last status in a subquery earlier (subquery t
).
如果您了解所有 ,则不应在此查询顶部打上 lead()
到您的最终目的地没有问题。 :)
If you've understood all that, you shouldn't have a problem slapping lead()
on top of this query to get to your final destination. :)
一旦变得如此复杂,就该切换到过程处理了。
Once it gets this complex, it's time to switch to procedural processing.
这个相对简单的 plpgsql函数改变了复杂窗口函数查询的性能,原因很简单,它只需要扫描整个表即可
This comparatively simple plpgsql function nukes the performance of the complex window function query, for the simple reason that it has to scan the whole table only once.
CREATE OR REPLACE FUNCTION valid_t1 (OUT t t1) -- row variable of table type
RETURNS SETOF t1 LANGUAGE plpgsql AS
$func$
DECLARE
_last_on int := -1; -- init with impossible value
BEGIN
FOR t IN
SELECT * FROM t1 ORDER BY id
LOOP
IF t.status = 1 THEN
IF _last_on <> t.val THEN
RETURN NEXT;
_last_on := t.val;
END IF;
ELSE
IF _last_on = t.val THEN
RETURN NEXT;
_last_on := -1;
END IF;
END IF;
END LOOP;
END
$func$;
致电:
SELECT * FROM valid_t1();
这篇关于将SQL差距跳过特定条件&正确的lead()使用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!