T-SQL:删除所有重复行但保留一个 [英] T-SQL: Deleting all duplicate rows but keeping one

查看:25
本文介绍了T-SQL:删除所有重复行但保留一个的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含大量行的表.不允许重复,但由于行的创建方式存在问题,我知道此表中有一些重复.我需要从关键列的角度消除额外的行.其他一些列可能有稍微不同的数据,但我不在乎.但是,我仍然需要保留这些行之一.SELECT DISTINCT 不起作用,因为它对所有列进行操作,我需要根据关键列抑制重复项.

I have a table with a very large amount of rows. Duplicates are not allowed but due to a problem with how the rows were created I know there are some duplicates in this table. I need to eliminate the extra rows from the perspective of the key columns. Some other columns may have slightly different data but I do not care about that. I still need to keep one of these rows however. SELECT DISTINCT won't work because it operates on all columns and I need to suppress duplicates based on the key columns.

如何删除多余的行,但仍然有效地保留一行?

How can I delete the extra rows but still keep one efficiently?

推荐答案

你没有说你用的是什么版本,但在 SQL 2005 及以上,你可以使用带有 结束条款.它有点像这样:

You didn't say what version you were using, but in SQL 2005 and above, you can use a common table expression with the OVER Clause. It goes a little something like this:

WITH cte AS (
  SELECT[foo], [bar], 
     row_number() OVER(PARTITION BY foo, bar ORDER BY baz) AS [rn]
  FROM TABLE
)
DELETE cte WHERE [rn] > 1

尝试一下,看看你会得到什么.

Play around with it and see what you get.

(为了提供帮助,有人编辑了 CTE 中的 ORDER BY 子句.要清楚的是,您可以在此处按任何您想要的顺序排序,它不必是其中之一cte 返回的列.事实上,这里的一个常见用例是foo, bar"是组标识符,而baz"是某种时间戳.​​为了保持最新,你会做 ORDER BY baz desc)

( In an attempt to be helpful, someone edited the ORDER BY clause within the CTE. To be clear, you can order by anything you want here, it needn't be one of the columns returned by the cte. In fact, a common use-case here is that "foo, bar" are the group identifier and "baz" is some sort of time stamp. In order to keep the latest, you'd do ORDER BY baz desc)

这篇关于T-SQL:删除所有重复行但保留一个的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆