从MySQL表中删除重复值的最佳方法是什么? [英] What is the best way to delete duplicate values from MySQL Table?

查看:290
本文介绍了从MySQL表中删除重复值的最佳方法是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下SQL从表中删除重复值,

I have the following SQL to delete duplicate values form a table,

DELETE p1 
FROM `ProgramsList` p1, `ProgramsList` p2  
WHERE p1.CustId = p2.CustId 
    AND p1.CustId = 1 
    AND p1.`Id`>p2.`Id` 
    AND p1.`ProgramName` = p2.`ProgramName`;

ID 是自动增量

给定 CustId ProgramName 必须是唯一的(目前不是)

上述SQL大约需要4到5个小时才能完成约1,000,000条记录

Id is auto incremental
for a given CustId ProgramName must be unique (currently it is not)
The above SQL takes about 4 to 5 hours to complete with about 1,000,000 records

有人可以建议您从表中删除重复的方式吗?

Could anyone suggest a quicker way of deleting duplicates from a table?

推荐答案

首先,如果还没有添加索引,可以尝试向ProgramName和CustID字段添加索引。

First, You might try adding indexes to ProgramName and CustID fields if you don't already have them.

De-Duping

您可以将记录分组以识别重复,正如你这样做,抓住每个组的最小ID值。然后,只需删除其ID不是MinID的所有记录。

You can group your records to identify dupes, and as you are doing that, grab the min ID value for each group. Then, just delete all records whose ID is not one of the MinID's.

条款方法

delete from
 ProgramsList
where
 id not in 
    (select min(id) as MinID
      from ProgramsList
      group by ProgramName, CustID) 

加入方法

如果每个组中有很多成员,您可能需要多次运行。

You may have to run this more than once, if there are many members per group.

DELETE P
FROM ProgramsList as P
INNER JOIN 
    (select count(*) as Count, max(id) as MaxID
     from ProgramsList
     group by ProgramName, CustID) as A on A.MaxID = P.id
WHERE A.Count >= 2

有些人在条款中有性能问题,有些则不会。这取决于你的索引等等。如果太慢,请尝试另一个。

Some people have performance issues with the In-Clause, some don't. It depends a lot on your indexes and such. If one is too slow, try the other.

相关: https://stackoverflow.com/a/4192849/127880

这篇关于从MySQL表中删除重复值的最佳方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆