SQL查询 - 如果超过3个重复,删除重复? [英] SQL Query - Delete duplicates if more than 3 dups?
问题描述
目前我有一个select语句,执行以下操作:
删除表
从表t
左外连接(
select max(id)as rowid,dupcol1,dupcol2
从表
group by dupcol1,dupcol2
)as keeprows on t.id = keeprows.rowid
其中,keeprows.rowid为null
这很好用。但是现在我想做的只是删除这些行,如果他们有超过2个副本。
谢谢
cte as(
select row_number()over(partition by dupcol1,dupcol2 order by ID)as rn
从表)
从cte
删除其中rn> 2; - 或> 3等
查询正在为每个记录制作一个行号,分组由(dupcol1,dupcol2)和按ID排序。实际上,该行号计数具有相同dupcol1和dupcol2的重复,然后分配数字1,2,3 .. N,按ID排序。如果你想保留只有2个重复,那么你需要删除那些被分配了数字 3,4,.. N
,那就是照顾的部分由 DELLETE .. WHERE rn> 2;
使用这种方法,您可以更改 ORDER BY
以适合您的喜好 LATEST
有 rn = 1
,那么最新的是rn = 2等等。其余的保持不变, DELETE
将仅删除最旧的行,因为它们具有最高的行号。
与这个密切相关的问题,随着条件变得更加复杂,使用CTE和row_number()变得更简单。如果没有适当的访问索引存在,性能可能是有问题的。
Does anyone have an elegant sql statement to delete duplicate records from a table, but only if there are more than x number of duplicates? So it allows up to 2 or 3 duplicates, but that's it?
Currently I have a select statement that does the following:
delete table
from table t
left outer join (
select max(id) as rowid, dupcol1, dupcol2
from table
group by dupcol1, dupcol2
) as keeprows on t.id=keeprows.rowid
where keeprows.rowid is null
This works great. But now what I'd like to do is only delete those rows if they have more than say 2 duplicates.
Thanks
with cte as (
select row_number() over (partition by dupcol1, dupcol2 order by ID) as rn
from table)
delete from cte
where rn > 2; -- or >3 etc
The query is manufacturing a 'row number' for each record, grouped by the (dupcol1, dupcol2) and ordered by ID. In effect this row number counts 'duplicates' that have the same dupcol1 and dupcol2 and assigns then the number 1, 2, 3.. N, order by ID. If you want to keep just 2 'duplicates', then you need to delete those that were assigned the numbers 3,4,.. N
and that is the part taken care of by the DELLETE.. WHERE rn > 2;
Using this method you can change the ORDER BY
to suit your preferred order (eg.ORDER BY ID DESC
), so that the LATEST
has rn=1
, then the next to latest is rn=2 and so on. The rest stays the same, the DELETE
will remove only the oldest ones as they have the highest row numbers.
Unlike this closely related question, as the condition becomes more complex, using CTEs and row_number() becomes simpler. Performance may be problematic still if no proper access index exists.
这篇关于SQL查询 - 如果超过3个重复,删除重复?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!