SQL查询 - 如果超过3个重复,删除重复? [英] SQL Query - Delete duplicates if more than 3 dups?

查看:108
本文介绍了SQL查询 - 如果超过3个重复,删除重复?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有没有人有一个优雅的sql语句来从表中删除重复的记录,但是只有当x个重复的数目超过x个时?所以它允许多达2或3个重复,但是这是吗?



目前我有一个select语句,执行以下操作:

 删除表
从表t
左外连接(
select max(id)as rowid,dupcol1,dupcol2
从表
group by dupcol1,dupcol2
)as keeprows on t.id = keeprows.rowid
其中,keeprows.rowid为null

这很好用。但是现在我想做的只是删除这些行,如果他们有超过2个副本。



谢谢

解决方案

  cte as(
select row_number()over(partition by dupcol1,dupcol2 order by ID)as rn
从表)
从cte
删除其中rn> 2; - 或> 3等

查询正在为每个记录制作一个行号,分组由(dupcol1,dupcol2)和按ID排序。实际上,该行号计数具有相同dupcol1和dupcol2的重复,然后分配数字1,2,3 .. N,按ID排序。如果你想保留只有2个重复,那么你需要删除那些被分配了数字 3,4,.. N ,那就是照顾的部分由 DELLETE .. WHERE rn> 2;



使用这种方法,您可以更改 ORDER BY 以适合您的喜好 LATEST rn = 1 ,那么最新的是rn = 2等等。其余的保持不变, DELETE 将仅删除最旧的行,因为它们具有最高的行号。



这个密切相关的问题,随着条件变得更加复杂,使用CTE和row_number()变得更简单。如果没有适当的访问索引存在,性能可能是有问题的。


Does anyone have an elegant sql statement to delete duplicate records from a table, but only if there are more than x number of duplicates? So it allows up to 2 or 3 duplicates, but that's it?

Currently I have a select statement that does the following:

delete table
from table t
left outer join (
 select max(id) as rowid, dupcol1, dupcol2
 from table
 group by dupcol1, dupcol2
) as keeprows on t.id=keeprows.rowid
where keeprows.rowid is null

This works great. But now what I'd like to do is only delete those rows if they have more than say 2 duplicates.

Thanks

解决方案

with cte as (
  select row_number() over (partition by dupcol1, dupcol2 order by ID) as rn
     from table)
delete from cte
   where rn > 2; -- or >3 etc

The query is manufacturing a 'row number' for each record, grouped by the (dupcol1, dupcol2) and ordered by ID. In effect this row number counts 'duplicates' that have the same dupcol1 and dupcol2 and assigns then the number 1, 2, 3.. N, order by ID. If you want to keep just 2 'duplicates', then you need to delete those that were assigned the numbers 3,4,.. N and that is the part taken care of by the DELLETE.. WHERE rn > 2;

Using this method you can change the ORDER BY to suit your preferred order (eg.ORDER BY ID DESC), so that the LATEST has rn=1, then the next to latest is rn=2 and so on. The rest stays the same, the DELETE will remove only the oldest ones as they have the highest row numbers.

Unlike this closely related question, as the condition becomes more complex, using CTEs and row_number() becomes simpler. Performance may be problematic still if no proper access index exists.

这篇关于SQL查询 - 如果超过3个重复,删除重复?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆