从 SQL Join 中删除重复项 [英] Removing duplicates from SQL Join

查看:39
本文介绍了从 SQL Join 中删除重复项的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

以下是与我的实际问题很接近的假设情况.Table1

The following is a hypothetical situation this which is close to my real problem. Table1

recid   firstname    lastname   company
1       A             B          AAA
2       D             E          DEF
3       G             H          IJK
4       A             B          ABC

我有一个看起来像这样的 table2

I have a table2 that looks like this

recid   firstname    lastname   company
10      A             B          ABC
20      D             E          DEF
30      M             D          DIM
40      A             B          CCC

现在,如果我在 recid 上加入表,它将给出 0 结果,不会有重复,因为 recid 是唯一的.但是,如果我在 firstnamelastname 列上加入,这些列不是唯一的并且有重复项,我会在内部联接中得到重复项.我在 join 上添加的列越多,情况就越糟(创建的重复项越多).

Now if I join the table on recid, it will give 0 results, there will be no duplicates because recid is unique. But if I join on firstname and lastname columns, which are not unique and there are duplicates, I get duplicates on inner join. The more columns I add on join, the worse it becomes (more duplicates are created).

在上述简单情况下,如何删除以下查询中的重复项?我想比较 firstnamelastname,如果它们匹配,我返回 firstnamelastnamerecid 来自 table2

In the above simple situation, how can I remove duplicates in the following query? I want to compare firstname and lastname, if they match, I return firstname, lastname and recid from table2

SELECT DISTINCT * FROM
(SELECT recid, first, last FROM table1) a
INNER JOIN
(SELECT recid, first,last FROM table2) b
ON a.first = b.first


如果有人想玩的话,脚本就在这里


The script is here if anyone wants to play with it in future

CREATE TABLE table1 (recid INT NOT NULL PRIMARY KEY, first varchar(20), last varchar(20), company varchar(20))
CREATE TABLE table2 (recid INT NOT NULL PRIMARY KEY, first varchar(20), last varchar(20), company varchar(20))

INSERT INTO TABLE1 VALUES(1,'A','B','ABC')
INSERT INTO TABLE1 VALUES(2,'D','E','DEF')
INSERT INTO TABLE1 VALUES(3,'M','N','MNO')
INSERT INTO TABLE1 VALUES(4,'A','B','ABC')

INSERT INTO TABLE2 VALUES(10,'A','B','ABC')
INSERT INTO TABLE2 VALUES(20,'D','E','DEF')
INSERT INTO TABLE2 VALUES(30,'Q','R','QRS')
INSERT INTO TABLE2 VALUES(40,'A','B','ABC')

推荐答案

您不想进行连接本身,您只是在测试是否存在/设置包含.

You don't want to do a join per se, you're merely testing for existence/set inclusion.

我不知道您当前使用的 SQL 风格是什么,但这应该可行.

I don't know what current flavor of SQL you're coding in, but this should work.

SELECT MAX(recid), firstname, lastname 
FROM table2 T2
WHERE EXISTS (SELECT * FROM table1 WHERE firstname = T2.firstame AND lastname = T2.lastname)
GROUP BY lastname, firstname

如果你想实现为一个连接,保持代码大致相同:

If you want to implement as a join, leaving the code largely the same:

SELECT max(t2.recid), t2.firstame, t2.lastname 
FROM Table2 T2 
INNER JOIN Table1 T1 
    ON T2.firstname = t1.firstname and t2.lastname = t1.lastname
GROUP BY t2.firstname, t2.lastname 

根据 DBMS,内部连接的实现方式可能与 Exists(半连接与连接)不同,但优化器有时无论如何都能弄清楚并选择正确的运算符,无论您以哪种方式编写它.

Depending on the DBMS, an inner join may be implemented differently to an Exists (semi-join vs join) but the optimizer can sometimes figure it out anyway and chose the correct operator regardless of which way you write it.

这篇关于从 SQL Join 中删除重复项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆