从MySQL表中删除重复值的最佳方法是什么? [英] What is the best way to delete duplicate values from MySQL Table?
问题描述
我有以下SQL从表中删除重复值,
I have the following SQL to delete duplicate values form a table,
DELETE p1
FROM `ProgramsList` p1, `ProgramsList` p2
WHERE p1.CustId = p2.CustId
AND p1.CustId = 1
AND p1.`Id`>p2.`Id`
AND p1.`ProgramName` = p2.`ProgramName`;
ID
是自动增量
给定 CustId
ProgramName
必须是唯一的(目前不是)
上述SQL大约需要4到5个小时才能完成约1,000,000条记录
Id
is auto incremental
for a given CustId
ProgramName
must be unique (currently it is not)
The above SQL takes about 4 to 5 hours to complete with about 1,000,000 records
有人可以建议您从表中删除重复的方式吗?
Could anyone suggest a quicker way of deleting duplicates from a table?
推荐答案
首先,如果还没有添加索引,可以尝试向ProgramName和CustID字段添加索引。
First, You might try adding indexes to ProgramName and CustID fields if you don't already have them.
De-Duping
您可以将记录分组以识别重复,正如你这样做,抓住每个组的最小ID值。然后,只需删除其ID不是MinID的所有记录。
You can group your records to identify dupes, and as you are doing that, grab the min ID value for each group. Then, just delete all records whose ID is not one of the MinID's.
条款方法
delete from
ProgramsList
where
id not in
(select min(id) as MinID
from ProgramsList
group by ProgramName, CustID)
加入方法
如果每个组中有很多成员,您可能需要多次运行。
You may have to run this more than once, if there are many members per group.
DELETE P
FROM ProgramsList as P
INNER JOIN
(select count(*) as Count, max(id) as MaxID
from ProgramsList
group by ProgramName, CustID) as A on A.MaxID = P.id
WHERE A.Count >= 2
有些人在条款中有性能问题,有些则不会。这取决于你的索引等等。如果太慢,请尝试另一个。
Some people have performance issues with the In-Clause, some don't. It depends a lot on your indexes and such. If one is too slow, try the other.
相关: https://stackoverflow.com/a/4192849/127880
这篇关于从MySQL表中删除重复值的最佳方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!