SQL查找具有多个字段(没有唯一ID)的重复项 [英] SQL Find duplicate with several field (no unique ID) WORK AROUND
问题描述
我正在尝试使用 vendor 表和 vendor_address 表中的几个字段从数据库中查找重复的供应商.问题是,我进行的内部联接越多,查询失去潜在结果的可能性就越小.虽然我没有重复的供应商ID,但我希望找到类似的潜在ID.
I am trying to find duplicated vendors from a database using several fields from vendor table and vendor_address table. The thing is the more inner join I make the less the query is loosing potential results. While I don't have duplicate in vendor ID I'm look to find potential one that are similar.
这是我到目前为止的查询:
Here is my query so far:
SELECT
o.vendor_id
,o.vndr_name_shrt_user
,O.COUNTRY
,O.VENDOR_NAME_SHORT
,B.POSTAL
,B.ADDRESS1
,SAME_ADDRESS_NB
,SAME_POSTAL_NB
,OC.SAME_SHORT_NAME
,oc.SAME_USER_NUM
FROM VENDOR o
JOIN vendor_addr B ON o.VENDOR_ID = B.VENDOR_ID
INNER JOIN (
SELECT vndr_name_shrt_user, COUNT(*) AS SAME_USER_NUM
FROM VENDOR
WHERE COUNTRY = 'CANADA'
AND VENDOR_STATUS = 'A'
GROUP BY vndr_name_shrt_user
HAVING COUNT(*) > 1
) oc on o.vndr_name_shrt_user = oc.vndr_name_shrt_user
INNER JOIN ( SELECT VENDOR_NAME_SHORT, COUNT(*) AS SAME_SHORT_NAME
FROM VENDOR
WHERE COUNTRY = 'CANADA'
AND VENDOR_STATUS = 'A'
GROUP BY VENDOR_NAME_SHORT
HAVING COUNT(*) > 1
) oc on o.VENDOR_NAME_SHORT = oc.VENDOR_NAME_SHORT
INNER JOIN (SELECT POSTAL, COUNT(*) AS SAME_POSTAL_NB
FROM vendor_addr
WHERE COUNTRY = 'CANADA'
AND COUNTRY ='CANADA'
AND POSTAL != ' '
GROUP BY POSTAL
HAVING COUNT(*) > 1
) oc on b.POSTAL = oc.POSTAL
INNER JOIN (SELECT ADDRESS1, COUNT(*) AS SAME_ADDRESS_NB
FROM ps_vendor_addr
WHERE COUNTRY = 'CANADA'
AND COUNTRY ='CANADA'
AND ADDRESS1 != ' '
GROUP BY ADDRESS1
HAVING COUNT(*) > 1
) oc on b.ADDRESS1 = oc.ADDRESS1
WHERE O.COUNTRY ='CANADA'
AND B.COUNTY = 'CANADA';
推荐答案
似乎您的联接有点有趣,原因不止一个.首先,您具有内部联接,它将消除所有具有 all 重复符号的联接,这是您不想要的.此外,您似乎在所有派生表上都具有相同的别名oc-并不会真正在这里使用,并且您将不会走得更远.
It seems as if your joins are a bit interesting, for more reasons than one. Firstly, you have inner joins, which will eliminate all but those which have all signs of duplications - this is something which you don't want. Additionally, you seem to have the same alias, oc, on all derived tables - that's not really gonna fly here, and you're not going to get very far with that.
与其建议您这样做,不如对每个重复符号重复基本查询-如下所示(我删除了same_address_nb和same_postal_nb字段,然后您会看到原因):
Instead of doing it this way, I'd suggest that you have your basic query repeated for each of the duplication signs - as follows (I removed the same_address_nb and same_postal_nb fields, and you'll see why):
select
o.vendor_id
,o.vndr_name_shrt_user
,O.COUNTRY
,O.VENDOR_NAME_SHORT
,B.POSTAL
,B.ADDRESS1
,OC.SAME_SHORT_NAME
,oc.SAME_USER_NUM
from VENDOR o
JOIN vendor_addr B ON o.VENDOR_ID = B.VENDOR_ID
WHERE O.COUNTRY ='CANADA'
AND B.COUNTY = 'CANADA'
AND ...
对于这些重复符号中的每一个,您都将向上面显示的椭圆添加一个嵌套查询,如下所示-使用vndr_name_shrt_user中的重复显示的示例:
For each one of these duplication signs, you'll add a nested query to the ellipses shown above as follows - example shown using the duplicate in vndr_name_shrt_user:
select
o.vendor_id
,o.vndr_name_shrt_user
,O.COUNTRY
,O.VENDOR_NAME_SHORT
,B.POSTAL
,B.ADDRESS1
,OC.SAME_SHORT_NAME
,oc.SAME_USER_NUM
,'SAME_USER_NUM' as duplicateFlag
from VENDOR o
JOIN vendor_addr B ON o.VENDOR_ID = B.VENDOR_ID
WHERE O.COUNTRY ='CANADA'
AND B.COUNTY = 'CANADA'
AND o.vndr_name_shrt_user in
(
SELECT
vndr_name_shrt_user
FROM VENDOR
WHERE COUNTRY = o.country
AND VENDOR_STATUS = 'A'
GROUP BY vndr_name_shrt_user
HAVING COUNT(*) > 1
)
您可以一起UNION ALL
这些查询,然后查看所有重复项.
You can UNION ALL
these queries together and then see all of your duplicates.
作为旁注,您在最后三个派生表中两次检查了country = 'canada'
.
As a side note, you had a check for the country = 'canada'
twice in the last three derived table.
更新:显示多个重复标记
select
o.vendor_id
,o.vndr_name_shrt_user
,O.COUNTRY
,O.VENDOR_NAME_SHORT
,B.POSTAL
,B.ADDRESS1
,OC.SAME_SHORT_NAME
,oc.SAME_USER_NUM
,'SAME_USER_NUM' as duplicateFlag
from VENDOR o
JOIN vendor_addr B ON o.VENDOR_ID = B.VENDOR_ID
WHERE O.COUNTRY ='CANADA'
AND B.COUNTY = 'CANADA'
AND o.vndr_name_shrt_user in
(
SELECT
vndr_name_shrt_user
FROM VENDOR
WHERE COUNTRY = o.country
AND VENDOR_STATUS = 'A'
GROUP BY vndr_name_shrt_user
HAVING COUNT(*) > 1
)
UNION ALL
select
o.vendor_id
,o.vndr_name_shrt_user
,O.COUNTRY
,O.VENDOR_NAME_SHORT
,B.POSTAL
,B.ADDRESS1
,OC.SAME_SHORT_NAME
,oc.SAME_USER_NUM
,'VENDOR_NAME_SHORT' as duplicateFlag
from VENDOR o
JOIN vendor_addr B ON o.VENDOR_ID = B.VENDOR_ID
WHERE O.COUNTRY ='CANADA'
AND B.COUNTY = 'CANADA'
AND o.VENDOR_NAME_SHORT in
(
SELECT
VENDOR_NAME_SHORT
FROM VENDOR
WHERE COUNTRY = o.country
AND VENDOR_STATUS = 'A'
GROUP BY VENDOR_NAME_SHORT
HAVING COUNT(*) > 1
)
这篇关于SQL查找具有多个字段(没有唯一ID)的重复项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!