从大型表中获取重复的记录 [英] Get duplicate records from a large table
问题描述
我有一个表格,我想在三列上应用唯一约束,
表格有〜250000条记录。
我想获取重复记录,以便我可以做出决定(如何处理重复数据),我的查询需要超过600秒并超时。我无法增加查询时间。
I have a table on which I want to apply unique constraint on three columns, Table have ~250000 records. I want get duplicate records so that I can take a decision (what to do with duplicate data), My query takes more than 600 seconds and times out. I cannot increase the time for query.
SELECT
col1, col2, col3, COUNT(*) AS c
FROM
table
WHERE
col3 != ''
GROUP BY col1, col2, col3
HAVING c > 1
我曾尝试使用10条记录的限制,但没有运气。
I have tried using limit of 10 records but no luck.
编辑:
索引已被添加到所有这些列中
EDIT : Index has been added on all of these columns
推荐答案
我已经找到从大表中获取重复记录的更快方式,
I have found the faster way to fetch duplicate records from a large table,
SELECT n1.col1 , n1.col2 , n1.col3
FROM table AS n1,
table AS n2 WHERE n1.id > n2.id AND n1.col1 = n2.col1 AND n1.col2 = n2.col2
AND n1.col3 = n2.col3;
该查询不会返回重复记录的计数,但会返回重复的记录,
This query will not return the count of duplicate records, but it will return duplicated records,
这篇关于从大型表中获取重复的记录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!