如何查找键重复但所有列中不重复的行? [英] How to Find Rows which are Duplicates by a Key but Not Duplicates in All Columns?

查看:33
本文介绍了如何查找键重复但所有列中不重复的行?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在处理一个表格,它是一组其他表格的摘录.根据键 D1、D2 和 D3,提取表的所有行都应该是唯一的.他们不是.似乎较早的开发人员试图通过在从此表查询的所有列中使用 SELECT DISTINCT 来解决此问题.这将起作用,但前提是在 (D1, D2, D3) 上重复的每一行也是非键列中的重复项(忽略添加到提取表中的 IDENTITY 列).

I am working with a table which is an extract of a set of other tables. All of the rows of the extract table should be unique according to keys D1, D2 and D3. They are not. It appears that an earlier developer attempted to solve this problem by using a SELECT DISTINCT across all columns being queried from this table. This will work, but only if every row which is a duplicate on (D1, D2, D3) is also a duplicate across the non-key columns (ignoring the IDENTITY column that was added to the extract table).

换句话说,给定的行如下:

In other words, given rows as follows:

D1  D2  D3  C4  C5  C6
=== === === === === ===
A   B   C   X1  X2  X3
A   B   C   X1  X2  X3

然后

SELECT DISTINCT D1, D2, D3, C4, C5, C6
FROM BAD_TABLE

将工作",因为在 (D1,D2,D3) 上重复的行之间没有区别.但是如果表中包含

will "work", as there's no difference between the rows which are duplicated on (D1,D2,D3). But if the table contained

D1  D2  D3  C4  C5  C6
=== === === === === ===
A   B   C   X1  X2  X3
A   B   C   X1  X2  X4

然后 SELECT DISTINCT 将为键 (A,B,C) 返回两行.此外,我们必须决定 X3 或 X4 中的哪一个是正确"的值.

then SELECT DISTINCT would return two rows for the key (A,B,C). Furthermore, we would have to decide which of X3 or X4 was the "correct" value.

我知道如何在 (D1,D2,D3) 上找到重复项.我什至知道如何在所有列(IDENTITY 列除外)中查找重复项:

I know how to find the duplicates on (D1,D2,D3). I even know how to find the duplicates across all the columns (other than the IDENTITY column):

;
WITH DUPLICATES(D1,D2,D3) AS
(
    SELECT D1, D2, D3
    FROM SOURCE
    GROUP BY D1, D2, D3
    HAVING COUNT(*)>1
)
SELECT S.D1, S.D2, S.D3, S.C4, S.C5, S.C6
FROM SOURCE S
INNER JOIN DUPLICATES D
    ON S.D1 = D.D1 AND S.D2 = D.D2 AND S.D3 = D.D3
ORDER BY S.D1, S.D2, S.D3, S.C4, S.C5, S.C6

问题是,如何找到在 (D1,D2,D3) 上重复但在 (D1,D2,D3,C4,C5) 上 重复的上述结果集的子集,C6)?

The question is, how do I find the subset of the above resultset which are duplicates on (D1,D2,D3), but not duplicates on (D1,D2,D3,C4,C5,C6)?

推荐答案

您有什么理由不创建另一个表表达式来覆盖更多字段并加入该字段?

Any reason you don't just create another table expression to cover more fields and join to that one?

WITH DUPLICATEKEY(D1,D2,D3) AS
(
    SELECT D1, D2, D3
    FROM SOURCE
    GROUP BY D1, D2, D3
    HAVING COUNT(*)>1
)
WITH NODUPES(D1,D2,D3,C4,C5,C6) AS
(
SELECT 
S.D1, S.D2, S.D3, S.C4, S.C5, S.C6
FROM SOURCE S
GROUP BY
 S.D1, S.D2, S.D3, S.C4, S.C5, S.C6
HAVING COUNT(*)=1
)

SELECT S.D1, S.D2, S.D3, S.C4, S.C5, S.C6
FROM SOURCE S
INNER JOIN DUPLICATEKEY D
    ON S.D1 = D.D1 AND S.D2 = D.D2 AND S.D3 = D.D3

INNER JOIN NODUPES D2
    ON S.D1 = D2.D1 AND S.D2 = D2.D2 AND S.D3 = D2.D3

ORDER BY S.D1, S.D2, S.D3, S.C4, S.C5, S.C6

这篇关于如何查找键重复但所有列中不重复的行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆