对表进行重复数据删除的最佳方法是什么? [英] What's the best way to dedupe a table?
问题描述
我已经看到了一些解决方案,但我想知道最好和最有效的方法是对表进行重复数据删除.您可以使用代码(SQL 等)来说明您的观点,但我只是在寻找基本算法.我以为在 SO 上已经有关于此的问题,但我找不到,所以如果它已经存在,请提醒我.
I've seen a couple of solutions for this, but I'm wondering what the best and most efficient way is to de-dupe a table. You can use code (SQL, etc.) to illustrate your point, but I'm just looking for basic algorithms. I assumed there would already be a question about this on SO, but I wasn't able to find one, so if it already exists just give me a heads up.
(澄清一下 - 我指的是删除具有增量自动 PK 并且在除 PK 字段之外的所有内容中都有一些重复的表中的重复项.)
(Just to clarify - I'm referring to getting rid of duplicates in a table that has an incremental automatic PK and has some rows that are duplicates in everything but the PK field.)
推荐答案
使用解析函数row_number:
Using analytic function row_number:
WITH CTE (col1, col2, dupcnt)
AS
(
SELECT col1, col2,
ROW_NUMBER() OVER (PARTITION BY col1, col2 ORDER BY col1) AS dupcnt
FROM Youtable
)
DELETE
FROM CTE
WHERE dupcnt > 1
GO
这篇关于对表进行重复数据删除的最佳方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!