将SQL差距跳过特定条件&正确的lead()使用 [英] Jump SQL gap over specific condition & proper lead() usage

查看:187
本文介绍了将SQL差距跳过特定条件&正确的lead()使用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

(PostgreSQL 8.4)继续我的上一个示例,我希望进一步了解我对使用Window函数进行间隙和孤岛处理的理解。请考虑以下表和数据:

(PostgreSQL 8.4) Continuing with my previous example, I wish to further my understanding of gaps-and-islands processing with Window-functions. Consider the following table and data:

CREATE TABLE T1
(
  id SERIAL PRIMARY KEY,
  val INT,   -- some device
  status INT -- 0=OFF, 1=ON
);

INSERT INTO T1 (val, status) VALUES (10, 0);
INSERT INTO T1 (val, status) VALUES (11, 0);
INSERT INTO T1 (val, status) VALUES (11, 1);
INSERT INTO T1 (val, status) VALUES (10, 1);
INSERT INTO T1 (val, status) VALUES (11, 0);
INSERT INTO T1 (val, status) VALUES (10, 0);

如前所述,设备会打开和关闭,这一次我希望提取一个特定的序列:

As previously explained, the devices turn ON and OFF and this time I wish to extract a specific sequence:


  • 显示所有不重复的新 ON 状态记录(同一设备两次

  • 从当前 ON 设备显示适当的 OFF 状态

  • show all new ON status records that aren't duplicate (same device twice in a row)
  • show appropriate OFF status from currently ON device

我能得到的最接近的是:

The closest I could get is this:

SELECT * FROM  (
  SELECT *
       ,lag(val, 1, 0) OVER (PARTITION BY status ORDER BY id) last_val
       ,lag(status, 1, -1) OVER (PARTITION BY val ORDER BY id) last_status
  FROM   t1
  ) x
WHERE  (last_val <> val OR last_status <> status)
AND    (status = 1 OR last_status <> -1)
ORDER BY id

样本不包含的更多伪造数据,但本质上是关于取出后续重复项(无论状态如何)和顶部 OFF 记录不匹配。记录 3 4 5 6 返回,但是我不希望第五个,这是 OFF 的关闭,它是在新的 ON之后出现的。因此,我需要跨越这个空白,为当前处于活动状态的设备寻找下一个正确的 OFF

This filters out more bogus data that the sample doesn't include but essentially it's about taking out subsequent duplicates (regardless of status) and the top OFF records which don't match. Records 3, 4, 5 and 6 are returned, but I don't want the fifth, it's an OFF which came after a new ON. So I need to jump that gap and look for the next proper OFF for the currently active device.


  1. 10关闭-在此情况下为假,但弄乱了lag()

  2. 11关闭-在此情况下,为假,但弄乱了lag()

  3. 11打开-确定,新序列,包含在SELECT

  4. 10打开-确定,新序列,包含在SELECT

  5. 11关闭-消息来晚了,需要忽略间隙

  6. 10关闭-可以,请正确关闭第4行,并包括在SELECT

  1. 10 turns OFF -- bogus in this context, but messes up the lag()
  2. 11 turns OFF -- bogus in this context, but messes up the lag()
  3. 11 turns ON -- OK, new sequence, include in SELECT
  4. 10 turns ON -- OK, new sequence, include in SELECT
  5. 11 turns OFF -- message came in late, need to ignore gap
  6. 10 turns OFF -- OK, proper OFF to row 4, need to include in SELECT

中,一旦正确过滤,我想使用 lead() 最重要的是获取下一行的ID(想象一个时间戳),并过滤掉所有 ON 状态的记录。我想这将需要三个嵌入式SELECT语句。这将使我清楚地了解设备处于活动状态的时间,直到再次 ON 或适当关闭 OFF

Once filtered properly, I would like to use lead() on top of it all to get the next row's id (imagine a timestamp) as well as filter out all records that aren't ON statuses. I imagine this will require three embedded SELECT statements. This would get me a clear understanding of how long a device was active, until the condition of either another ON or a proper turn OFF.

推荐答案

带有窗口函数的查询



Query with window functions

SELECT *
FROM  (
   SELECT *
         ,lag(val, 1, 0)    OVER (PARTITION BY status ORDER BY id) AS last_val
         ,lag(status, 1, 0) OVER w2 AS last_status
         ,lag(next_id)      OVER w2 AS next_id_of_last_status
   FROM  (
      SELECT *, lead(id) OVER (PARTITION BY status ORDER BY id) AS next_id
      FROM   t1
      ) AS t
   WINDOW w2 AS (PARTITION BY val ORDER BY id)
  ) x
WHERE (last_val <> val OR last_status <> status)
AND   (status = 1 
       OR last_status = 1
          AND ((next_id_of_last_status > id) OR next_id_of_last_status IS NULL)
      )
ORDER  BY id

除了我们已经拥有的,我们需要有效的OFF开关。

In addition to what we already had, we need valid OFF switches.

An OFF 开关是否有效,如果在之前设备 ON 被打开( last_status = 1 ),然后进行下一个 ON 操作,然后再进行相关的 OFF 开关( next_id_of_last_status> id )。

An OFF switch if valid if the device was switched ON before (last_status = 1) and the next ON operation after that comes after the OFF switch in question (next_id_of_last_status > id).

我们必须提供特殊情况,即最后一个 ON 操作,因此我们还要检查 NULL 或next_id_of_last_status IS NULL )。

We have to provide for the special case that there is was the last ON operation, so we check for NULL in addition (OR next_id_of_last_status IS NULL).

next_id_of_last_status 来自我们从中获取 last_status 的同一窗口。因此,我为显式窗口声明引入了其他语法,因此不必重复我自己:

The next_id_of_last_status comes from the same window that we take last_status from. Therefore I introduced additional syntax for explicit window declaration, so I don't have to repeat myself:

WINDOW w2 AS (PARTITION BY val ORDER BY id)

我们需要更早地获得子查询中最后一个状态的下一个ID (子查询 t )。

And we need to get the next id for the last status in a subquery earlier (subquery t).

如果您了解所有 ,则不应在此查询顶部打上 lead()到您的最终目的地没有问题。 :)

If you've understood all that, you shouldn't have a problem slapping lead() on top of this query to get to your final destination. :)

一旦变得如此复杂,就该切换到过程处理了。

Once it gets this complex, it's time to switch to procedural processing.

这个相对简单的 plpgsql函数改变了复杂窗口函数查询的性能,原因很简单,它只需要扫描整个表即可

This comparatively simple plpgsql function nukes the performance of the complex window function query, for the simple reason that it has to scan the whole table only once.

CREATE OR REPLACE FUNCTION valid_t1 (OUT t t1)  -- row variable of table type
  RETURNS SETOF t1 LANGUAGE plpgsql AS
$func$
DECLARE
   _last_on int := -1;  -- init with impossible value
BEGIN

FOR t IN
   SELECT * FROM t1 ORDER BY id
LOOP
   IF t.status = 1 THEN
      IF _last_on <> t.val THEN
         RETURN NEXT;
         _last_on := t.val;
      END IF;
   ELSE
      IF _last_on = t.val THEN
         RETURN NEXT;
         _last_on := -1;
      END IF;
   END IF;
END LOOP;

END
$func$;

致电:

SELECT * FROM valid_t1();

这篇关于将SQL差距跳过特定条件&amp;正确的lead()使用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆