从大型表中获取重复的记录 [英] Get duplicate records from a large table

查看:138
本文介绍了从大型表中获取重复的记录的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个表格,我想在三列上应用唯一约束
表格有〜250000条记录。
我想获取重复记录,以便我可以做出决定(如何处理重复数据),我的查询需要超过600秒并超时。我无法增加查询时间。

I have a table on which I want to apply unique constraint on three columns, Table have ~250000 records. I want get duplicate records so that I can take a decision (what to do with duplicate data), My query takes more than 600 seconds and times out. I cannot increase the time for query.

SELECT 
    col1, col2, col3, COUNT(*) AS c
FROM
    table
WHERE
    col3 != ''
GROUP BY col1, col2, col3
HAVING c > 1

我曾尝试使用10条记录的限制,但没有运气。

I have tried using limit of 10 records but no luck.

编辑:
索引已被添加到所有这些列中

EDIT : Index has been added on all of these columns

推荐答案

我已经找到从大表中获取重复记录的更快方式,

I have found the faster way to fetch duplicate records from a large table,

SELECT n1.col1 , n1.col2 , n1.col3  
FROM table AS n1,
table AS n2 WHERE n1.id > n2.id AND n1.col1 = n2.col1 AND n1.col2 = n2.col2
AND n1.col3 = n2.col3;

该查询不会返回重复记录的计数,但会返回重复的记录,

This query will not return the count of duplicate records, but it will return duplicated records,

这篇关于从大型表中获取重复的记录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆