删除表中的所有重复行(如果行重复一次,则应删除这两行) [英] delete all duplicate rows in a table (if a row is duplicated once, both rows should be deleted)

查看:167
本文介绍了删除表中的所有重复行(如果行重复一次,则应删除这两行)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的情况是:



我有两个表table1和table2。

两者都是相同的,除了table2没有主键或约束。



我需要将数据从table2复制到table1。



table1不接受重复的行。



table1中的主键基于5列col1,col2,col3,col4,col5。



table2有一堆重复的行。



我可以使用以下方法获取table2中的重复行:

My scenario is this:

I have two tables table1 and table2.
Both of them are identical except that table2 has no primary key or constraints.

I need to copy data from table2 to table1.

table1 does not accept duplicate rows.

The primary key in table1 is based on 5 columns col1, col2, col3, col4, col5.

table2 has a bunch of duplicated rows.

I am able to get the duplicated rows in table2 using:

SELECT count(*) as 'Occurance'col1,col2,col3,col4, col5
FROM table2
GROUP BY col1,col2,col3,col4, col5
HAVING count(*) > 1





我可以使用下面的查询删除一个副本:



And I am able to delete one duplicate using the query below:

WITH cte
AS (SELECT col1,col2,col3,col4, col5,
row_number() OVER(PARTITION BY col1,col2,col3,col4, col5 ORDER BY col1) AS [rn]
  FROM table2
)
DELETE cte WHERE [rn] <> 1





但我想要的是删除所有重复项找到了。



要说清楚有两行:

col1 = 1,col2 = 2,col3 = 3,col4 = 4 ,col5 = 5,col6 = TEXAS

col1 = 1,col2 = 2,col3 = 3,col4 = 4,col5 = 5,col6 =纽约



所以上面发布的第一个查询会给我一个结果:

col1 = 1,col2 = 2,col3 = 3,col4 = 4,col5 = 5,Occurance = 2



第二个查询将删除以上任何一个(即TEXAS或NEW YORK)。

我想删除TEXAS和纽约。



But what I want is to delete all the duplicates which are found.

To make it clear there are two rows:
col1 = 1, col2 = 2, col3 = 3, col4 = 4, col5 = 5, col6 = TEXAS
col1 = 1, col2 = 2, col3 = 3, col4 = 4, col5 = 5, col6 = NEW YORK

So the first query posted above will give me a result as:
col1 = 1, col2 = 2, col3 = 3, col4 = 4, col5 = 5, Occurance = 2

The second query will delete any one of the above (i.e. either TEXAS or NEW YORK).
I want to delete both TEXAS and NEW YORK.

推荐答案

我不确定为什么在将行复制到Table1之前需要删除行。



我建议从表2中获取DISTINCT行,然后将它们插入表1中

I am not sure why you need to delete rows before you copy them to Table1.

What I would suggest is get the DISTINCT rows from Table2 and then insert them in Table1
INSERT INTO Table1
SELECT DISTINCT Column1, Column2, Column3 Column4, Column5 FROM Table2


检查这个 DEMO [ ^ ]

您可以使用 SQL INSERT INTO SELECT Statement [ ^ ] with group by
check this DEMO[^]
you can use SQL INSERT INTO SELECT Statement[^] with group by


如果我理解你的要求。

需要删除table1上的主键约束。

If I understand your requirements correctly.
The primary key constraint on table1 needs to be dropped.
--data setup for testing
declare @table2 table(col1 int, col2 int, col3 int, col4 int, col5 int, col6 varchar(10));
declare @table1 table(col1 int, col2 int, col3 int, col4 int, col5 int, col6 varchar(10));
--values for @table2
insert into @table2 values(1,1,1,1,1,'TEXAS');
insert into @table2 values(1,1,1,1,1,'NEW YORK');
insert into @table2 values(1,1,1,1,1,'IOWA');
insert into @table2 values(1,1,1,1,2,'TEXAS');
insert into @table2 values(1,1,1,1,2,'NEW YORK');
insert into @table2 values(1,1,1,1,2,'IOWA');
insert into @table2 values(1,1,1,2,1,'TEXAS');
insert into @table2 values(1,1,1,2,1,'NEW YORK');
insert into @table2 values(1,1,1,2,1,'IOWA');
insert into @table2 values(1,1,2,1,1,'TEXAS');
insert into @table2 values(1,1,2,1,1,'NEW YORK');
insert into @table2 values(1,1,2,1,1,'IOWA');
insert into @table2 values(1,2,1,1,1,'TEXAS');
insert into @table2 values(1,2,1,1,1,'NEW YORK');
insert into @table2 values(1,2,1,1,1,'IOWA');
insert into @table2 values(2,1,1,1,1,'TEXAS');
insert into @table2 values(2,1,1,1,1,'NEW YORK');
insert into @table2 values(2,1,1,1,1,'IOWA');
--values for @table1
insert into @table1 values(1,1,1,1,2,'TEXAS');
insert into @table1 values(1,2,1,1,1,'TEXAS');




--insert into @table1 from @table2 but not exist in @table1
insert into @table1
select b.col1, b.col2, b.col3, b.col4, b.col5, b.col6
from @table2 b
where not exists(
    select a.col1, a.col2, a.col3, a.col4, a.col5, a.col6
    from @table1 a
    where a.col1 = b.col1
    and a.col2 = b.col2
    and a.col3 = b.col3
    and a.col4 = b.col4
    and a.col5 = b.col5
    and a.col6 = b.col6
);



这个可以在插入之前和之后运行


This one can be run before and after the insert

--remove from @table2 where already in @table1
with ToRemove as(
    select b.col1, b.col2, b.col3, b.col4, b.col5, b.col6
    from @table2 b
    where exists(
        select a.col1, a.col2, a.col3, a.col4, a.col5, a.col6
        from @table1 a
        where a.col1 = b.col1
        and a.col2 = b.col2
        and a.col3 = b.col3
        and a.col4 = b.col4
        and a.col5 = b.col5
        and a.col6 = b.col6
    )
)
delete from ToRemove;



我不知道CTE可以这种方式用于删除:)感谢那一个。



我建议你研究一下数据库中的一对多关系,也许会改变数据的存储方式。

读一下< a href =http://www.databaseprimer.com/pages/relationship_1tox/> http://www.databaseprimer.com/pages/relationship_1tox/ [ ^ ]


这篇关于删除表中的所有重复行(如果行重复一次,则应删除这两行)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆