删除除一行以外的所有重复值 [英] Remove all but one row with duplicate values

查看:181
本文介绍了删除除一行以外的所有重复值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含三列的表:KEYVALUELAST_UPDATED

I have a table with three columns: KEY, VALUE and LAST_UPDATED

有重复的VALUE字段.我想删除所有与VALUE相同的行,除了最近更新的其他 .

There are duplicate VALUE fields. I want to delete all the rows which have the same VALUE as others except for the most recently updated one.

因此,如果表包含这些行:

So if the table contained these rows:

1, "A", 2013-11-08
2, "B", 2013-10-30
3, "A", 2013-11-07
4, "A", 2013-11-01
5, "B", 2013-11-01

然后我只保留这些行:

1, "A", 2013-11-08
5, "B", 2013-11-01

如何在SQL中执行此操作?我想象DELETE FROM table WHERE key IN (SELECT key FROM table GROUP BY value HAVING count(*)>1)将从具有重复值的行中删除单个random(?)行,但是如何使其删除除最近更新的行之外的所有行?

How can you do this in SQL? I imagine DELETE FROM table WHERE key IN (SELECT key FROM table GROUP BY value HAVING count(*)>1) would delete a single random(?) row from those which are duplicate values, but how to make it remove all but the most-recently-updated row?

推荐答案

您可以使用左联接来做到这一点:

You can do this with a left join:

DELETE t
FROM table t
LEFT JOIN table t2 ON t.value = t2.value
AND t2.last_updated > t.last_updated
WHERE t2.key IS NOT NULL

这意味着对于每一行,它会寻找具有更新日期的另一行,如果有更新日期,则该行将被删除.您可能不得不考虑进行日期比较,而不是使用大于来比较日期,因为这样比较可靠.

This means for every row it looks for another row with a more recent update date, if there is one then the row will be deleted. You might have to look at doing a date diff instead of using greater than for comparing the dates as it is more reliable.

在这种情况下,我希望左联接的性能比创建和联接到内联表要好得多,但是如果性能是一个问题,那么最好尝试两种方法并选择性能最一致的方法最好的.

I would expect the performance of a left join to be far better in this case than creating and joining to an inline table but if performance is an issue then it may be best to try both ways and pick the one that most consistently performs the best.

这篇关于删除除一行以外的所有重复值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆