SQL - 抑制重复的*相邻*记录 [英] SQL - suppressing duplicate *adjacent* records

查看:13
本文介绍了SQL - 抑制重复的*相邻*记录的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要运行一个 Select 语句 (DB2 SQL),它不会根据某个字段提取相邻的行重复项.具体来说,我试图找出数据何时更改,这很困难,因为它可能会变回其原始值.

I need to run a Select statement (DB2 SQL) that does not pull adjacent row duplicates based on a certain field. In specific, I am trying to find out when data changes, which is made difficult because it might change back to its original value.

也就是说,我有一个表格,大致如下图,按字母排序,然后按日期排序:

That is to say, I have a table that vaguely resembles the below, sorted by Letter and then by Date:

A, 5, 2009-01-01
A, 12, 2009-02-01
A, 12, 2009-03-01
A, 12, 2009-04-01
A, 9, 2009-05-01
A, 9, 2009-06-01
A, 5, 2009-07-01

我想得到结果:

A, 5, 2009-01-01
A, 12, 2009-02-01
A, 9, 2009-05-01
A, 5, 2009-07-01

丢弃 adjacent 重复但保留最后一行(尽管它与第一行具有相同的编号).显而易见的:

discarding adjacent duplicates but keeping the last row (despite it having the same number as the first row). The obvious:

Select Letter, Number, Min(Update_Date) from Table group by Letter, Number 

不起作用——它不包括最后一行.

does not work -- it doesn't include the last row.

由于似乎有些混乱,我已将月份列澄清为日期列.它是一种人类可解析的简短形式,而不是实际的有效数据.

As there seems to have been some confusion, I have clarified the month column into a date column. It was meant as a human-parseable short form, not as actual valid data.

最后一行并不重要,因为它是最后一行,而是因为它有一个新值",也是一个旧值".按 NUMBER 分组会将其与第一行一起包装;它需要保持一个独立的实体.

The last row is not important BECAUSE it is the last row, but because it has a "new value" that is also an "old value". Grouping by NUMBER would wrap it in with the first row; it needs to remain a separate entity.

推荐答案

根据您使用的 DB2,有一些分析函数可以使这个问题很容易解决.下面是 Oracle 中的一个示例,但 select 语法似乎非常相似.

Depending on which DB2 you're on, there are analytic functions which can make this problem easy to solve. An example in Oracle is below, but the select syntax appears to be pretty similar.

create table t1 (c1 char, c2 number, c3 date);

insert into t1 VALUES ('A', 5, DATE '2009-01-01');
insert into t1 VALUES ('A', 12, DATE '2009-02-01');
insert into t1 VALUES ('A', 12, DATE '2009-03-01');
insert into t1 VALUES ('A', 12, DATE '2009-04-01');
insert into t1 VALUES ('A', 9, DATE '2009-05-01');
insert into t1 VALUES ('A', 9, DATE '2009-06-01');
insert into t1 VALUES ('A', 5, DATE '2009-07-01');

SQL> l
  1  SELECT C1, C2, C3
  2    FROM (SELECT C1, C2, C3,
  3                 LAG(C2) OVER (PARTITION BY C1 ORDER BY C3) AS PRIOR_C2,
  4                 LEAD(C2) OVER (PARTITION BY C1 ORDER BY C3) AS NEXT_C2
  5            FROM T1
  6         )
  7   WHERE C2 <> PRIOR_C2
  8      OR PRIOR_C2 IS NULL -- to pick up the first value
  9   ORDER BY C1, C3
SQL> /

C         C2 C3
- ---------- -------------------
A          5 2009-01-01 00:00:00
A         12 2009-02-01 00:00:00
A          9 2009-05-01 00:00:00
A          5 2009-07-01 00:00:00

这篇关于SQL - 抑制重复的*相邻*记录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆