SQL - 抑制重复*相邻*记录 [英] SQL - suppressing duplicate *adjacent* records

查看:302
本文介绍了SQL - 抑制重复*相邻*记录的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要运行一个Select语句(DB2 SQL),它不会根据某个字段拉出相邻的行重复。具体来说,我试图找出数据何时更改,这样做很困难,因为它可能会改回原来的值。

I need to run a Select statement (DB2 SQL) that does not pull adjacent row duplicates based on a certain field. In specific, I am trying to find out when data changes, which is made difficult because it might change back to its original value.

也就是说,我有一个表格模糊地类似于下面的信息,然后按日期排序:

That is to say, I have a table that vaguely resembles the below, sorted by Letter and then by Date:

A, 5, 2009-01-01
A, 12, 2009-02-01
A, 12, 2009-03-01
A, 12, 2009-04-01
A, 9, 2009-05-01
A, 9, 2009-06-01
A, 5, 2009-07-01

我想得到结果:

A, 5, 2009-01-01
A, 12, 2009-02-01
A, 9, 2009-05-01
A, 5, 2009-07-01

丢弃相邻重复,但保留最后一行(尽管与第一行具有相同的数字)。显然:

discarding adjacent duplicates but keeping the last row (despite it having the same number as the first row). The obvious:

Select Letter, Number, Min(Update_Date) from Table group by Letter, Number 

不起作用 - 它不包括最后一行。

does not work -- it doesn't include the last row.

编辑:由于似乎有一些混乱,我已经将月份栏列入了一个日期列。这意味着一个人类可解析的简短形式,而不是实际的有效数据。

As there seems to have been some confusion, I have clarified the month column into a date column. It was meant as a human-parseable short form, not as actual valid data.

编辑:最后一行不重要因为它是最后一行,而是因为它具有也是旧值的新值。按NUMBER进行分组将包含第一行;它需要保持一个单独的实体。

The last row is not important BECAUSE it is the last row, but because it has a "new value" that is also an "old value". Grouping by NUMBER would wrap it in with the first row; it needs to remain a separate entity.

推荐答案

根据您所在的DB2,有分析功能可以使此问题容易解决。 Oracle中的一个例子如下,但是select语法似乎非常相似。

Depending on which DB2 you're on, there are analytic functions which can make this problem easy to solve. An example in Oracle is below, but the select syntax appears to be pretty similar.

create table t1 (c1 char, c2 number, c3 date);

insert into t1 VALUES ('A', 5, DATE '2009-01-01');
insert into t1 VALUES ('A', 12, DATE '2009-02-01');
insert into t1 VALUES ('A', 12, DATE '2009-03-01');
insert into t1 VALUES ('A', 12, DATE '2009-04-01');
insert into t1 VALUES ('A', 9, DATE '2009-05-01');
insert into t1 VALUES ('A', 9, DATE '2009-06-01');
insert into t1 VALUES ('A', 5, DATE '2009-07-01');

SQL> l
  1  SELECT C1, C2, C3
  2    FROM (SELECT C1, C2, C3,
  3                 LAG(C2) OVER (PARTITION BY C1 ORDER BY C3) AS PRIOR_C2,
  4                 LEAD(C2) OVER (PARTITION BY C1 ORDER BY C3) AS NEXT_C2
  5            FROM T1
  6         )
  7   WHERE C2 <> PRIOR_C2
  8      OR PRIOR_C2 IS NULL -- to pick up the first value
  9   ORDER BY C1, C3
SQL> /

C         C2 C3
- ---------- -------------------
A          5 2009-01-01 00:00:00
A         12 2009-02-01 00:00:00
A          9 2009-05-01 00:00:00
A          5 2009-07-01 00:00:00

这篇关于SQL - 抑制重复*相邻*记录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆