MySQL删除重复记录但保持最新 [英] MySQL delete duplicate records but keep latest
问题描述
我有唯一的 id
和电子邮件
字段。电子邮件重复。我只想保留所有重复的一个电子邮件地址,但最新的 id
(最后插入的记录)。
如何实现?
想像你的表 test
包含以下数据:
ID EMAIL
--------------- ------- --------------------
1 aaa
2 bbb
3 ccc
4 bbb
5 ddd
6 eee
7 aaa
8 aaa
9 eee
所以,我们需要找到所有重复的邮件并删除所有的邮件,但最新的ID。在这种情况下,重复 aaa
, bbb
和 eee
,所以我们要删除ID 1,7,2,6
为了完成这个,首先我们需要找到所有重复的电子邮件:
选择电子邮件
从测试
组通过电子邮件
有count(*)> 1;
电子邮件
--------------------
aaa
bbb
eee
然后,从这个数据集,我们需要找到每个这些重复的电子邮件的最新ID: p>
select max(id)as lastId,email
from test
where email in(
select通过电子邮件向
发送电子邮件
具有count(*)> 1
)
通过电子邮件发送;
LASTID电子邮件
---------------------- -------------- ------
8 aaa
4 bbb
9 eee
最后,我们现在可以删除所有这些电子邮件,其ID小于LASTID。所以解决方案是:
删除测试
从测试
内部连接(
选择最大(id)as lastId,从test
发送电子邮件
其中电子邮件(
从电子邮件中选择电子邮件
通过电子邮件
具有count(*)> ; 1
)
组通过电子邮件
)duplicate on duplicate.email = test.email
其中test.id< duplic.lastId;
我现在没有在本机上安装mySql,但应该工作
更新
上面的删除工作,但我发现一个更优化的版本:
$ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $
选择max(id)as lastId,电子邮件
从test
组通过电子邮件
有count(*)> 1)duplicate on duplicate.email = test.email
其中test.id< duplic.lastId;
您可以看到它删除最旧的重复项,即1,7,2,6: p>
select * from test;
+ ---- + ------- +
| id |电子邮件|
+ ---- + ------- +
| 3 | ccc |
| 4 | bbb |
| 5 | ddd |
| 8 | aaa |
| 9 | eee |
+ ---- + ------- +
另一个版本,是由 Rene Limon 提供的删除
从测试
中删除,其中id不在(
中选择max(id)
from test
group by email)
I have unique id
and email
fields. Emails get duplicated. I only want to keep one Email address of all the duplicates but with the latest id
(the last inserted record).
How can I achieve this?
Imagine your table test
contains the following data:
ID EMAIL
---------------------- --------------------
1 aaa
2 bbb
3 ccc
4 bbb
5 ddd
6 eee
7 aaa
8 aaa
9 eee
So, we need to find all repeated emails and delete all of them, but the latest id. In this case, aaa
, bbb
and eee
are repeated, so we want to delete IDs 1, 7, 2, 6
To accomplish this, first we need to find all the repeated emails:
select email
from test
group by email
having count(*) > 1;
EMAIL
--------------------
aaa
bbb
eee
Then, from this dataset, we need to find the latest id for each one of these repeated emails:
select max(id) as lastId, email
from test
where email in (
select email
from test
group by email
having count(*) > 1
)
group by email;
LASTID EMAIL
---------------------- --------------------
8 aaa
4 bbb
9 eee
Finally we can now delete all of these emails with an Id smaller than LASTID. So the solution is:
delete test
from test
inner join (
select max(id) as lastId, email
from test
where email in (
select email
from test
group by email
having count(*) > 1
)
group by email
) duplic on duplic.email = test.email
where test.id < duplic.lastId;
I don't have mySql installed on this machine right now, but should work
Update
The above delete works, but I found a more optimized version:
delete test
from test
inner join (
select max(id) as lastId, email
from test
group by email
having count(*) > 1) duplic on duplic.email = test.email
where test.id < duplic.lastId;
You can see that it deletes the oldest duplicates, i.e. 1, 7, 2, 6:
select * from test;
+----+-------+
| id | email |
+----+-------+
| 3 | ccc |
| 4 | bbb |
| 5 | ddd |
| 8 | aaa |
| 9 | eee |
+----+-------+
Another version, is the delete provived by Rene Limon
delete from test
where id not in (
select max(id)
from test
group by email)
这篇关于MySQL删除重复记录但保持最新的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!