在SELECT中过滤出重复的后续记录 [英] Filtering out duplicate subsequent records in a SELECT

查看：354 发布时间：2020/5/29 22:26:33 sql postgresql gaps-and-islands window-functions

本文介绍了在SELECT中过滤出重复的后续记录的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

（PostgreSQL 8.4）
表 trackingMessages存储移动设备（tm_nl_mobileid）和固定设备（tm_nl_fixedId）之间的跟踪事件。

(PostgreSQL 8.4) Table "trackingMessages" stores tracking events between mobile devices (tm_nl_mobileid) and fixed devices (tm_nl_fixedId).

CREATE TABLE trackingMessages
(
  tm_id SERIAL PRIMARY KEY,           -- PK
  tm_nl_mobileId INTEGER,             -- FK to mobile
  tm_nl_fixedId INTEGER,              -- FK to fixed
  tm_date INTEGER,                    -- Network time
  tm_messageType INTEGER,             -- 0=disconnect, 1=connect
  CONSTRAINT tm_unique_row
    UNIQUE (tm_nl_mobileId, tm_nl_fixedId, tm_date, tm_messageType)
);

这里的问题是同一台移动设备可能两次连接到同一固定设备（或多次））。我不希望看到后续的固定设备，但是如果以后有连接到另一个固定设备的移动设备，可以稍后再连接到同一固定设备。

Problem here is that it's possible that the same mobile will connect to the same fixed twice (or more times) subsequently. I don't want to see the subsequent ones, but it's OK to see a mobile connected to a same fixed at a later date, provided there was a connection to a different fixed in between.

我想我已经接近了，但是还不够。我一直在使用以下CTE（在堆栈溢出中找到此处）

I think I'm close but not quite. I've been using the following CTE (found here on Stack Overflow)

WITH cte AS 
(
  SELECT tm_nl_fixedid, tm_date, Row_number() OVER (
    partition BY tm_nl_fixedid
    ORDER BY tm_date ASC
  ) RN 
  FROM   trackingMessages
) 
SELECT * FROM cte 
  WHERE tm_nl_mobileid = 150 AND tm_messagetype = 1
  ORDER BY tm_date;

请给我以下结果

32;1316538756;1
21;1316539069;1
32;1316539194;2
32;1316539221;3
21;1316539235;2

这里的问题是最后一列应该是1，1，1，2，1，实际上，第三个 32是重复的跟踪事件（连续两次固定相同），并且与 21的最后一次连接是可以的，因为 32介于两者之间。

The problem here is that the last column should be 1, 1, 1, 2, 1, because that third "32" is in fact a duplicate tracking event (twice in a row at the same fixed) and that last connection to "21" is OK because "32" was in between.

请不要建议使用光标，这是我目前正在努力摆脱的目标。游标解决方案确实可以工作，但是考虑到我必须处理的记录量，它太慢了。我宁愿修复CTE，仅选择 RN = 1 ...的位置，除非您有更好的主意！

Please don't suggest a cursor, this is what I am currently trying to move away from. The cursor solution does work, but it's too slow given the amount of records I have to deal with. I'd much rather fix the CTE and only select where RN = 1 ... unless you have a better idea!

推荐答案

好吧，您还没那么接近，因为 row_number（）不能同时跟踪两组的序列。 通过tm_nl_fixedid排序在日期上重新排序不存在，没有这样的东西。

Well, you're not that close because row_number() cannot track sequences by two groups at the same time. PARTITION BY tm_nl_fixedid ORDER BY date RESTART ON GAP does not exist, there's no such thing.

Itzik Ben-Gan为您面临的孤岛和缝隙问题提供了解决方案（实际上有几种解决方案）。想法是按主要条件（日期）对行进行排序，然后按划分条件+主要条件对行进行排序。普通字之间的差异将保持不变，因为它们属于相同的分区标准和日期序列。

Itzik Ben-Gan has a solution for the islands and gaps problem you are facing (several solutions, actually). The idea is to order rows by the main criteria (date) and then by partitioning criteria + main criteria. Difference between ordinals will remain the same as they belong to the same partitioning criteria and date series.

with cte as
(
  select *,
      -- While order by date and order by something-else, date
      -- run along, they belong to the same sequence
         row_number() over (order by tm_date)
       - row_number() over (order by tm_nl_fixedid, tm_date) grp
    from trackingMessages
)
select *,
    -- Now we can get ordinal number grouped by each sequence
       row_number() over (partition by tm_nl_fixedid, grp
                          order by tm_date) rn
  from cte
 order by tm_date

此处为Sql摆弄例子。

这是Sql Server MVP Deep Divs的第5章，提供了几种解决方案到岛屿和缺口问题。

这篇关于在SELECT中过滤出重复的后续记录的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在SELECT中过滤出重复的后续记录 [英] Filtering out duplicate subsequent records in a SELECT

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在SELECT中过滤出重复的后续记录 [英] Filtering out duplicate subsequent records in a SELECT

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭