选择重复记录 [英] Select duplicate records

查看:98
本文介绍了选择重复记录的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述





我的查询如下



声明@tbl表(id int identity (1,1),sname varchar(10),标记int)

插入@tbl值('A',10),('B',10),('A',10 ),('B',40),('C',10),('D',15),('A',9)

选择row_number()结束(sname分区) ,标记顺序标记)sno,sname,来自@tbl的标记



我的结果是:

sno sname marks

1 A 9

1 A 10

2 A 10

1 B 10

1 B 40

1 C 10

1 D 15



但我只想要第二和第三行,因为这些是重复记录。我想要所有重复的记录而不管重复。



请建议



谢谢



我的尝试:



重复记录消除

解决方案

鉴于 sno 总是为1(因为你要按 sname 进行分区 AND 标记)您可以将此简化为

 选择  distinct   1   as  sno,sname,标记来自  @ tbl  

关键点在于它是您引入的实际上导致问题的row_number()。



但我怀疑你想要 sno 是有意义的(如果我错了,请纠正我。



如果你想要它显示的数量然后使用

 选择 row_number()  分区  sname  订单  by  sname,marks) as  sno,sname,marks 
FROM 选择 distinct 1 as sno,sname,标记来自 @tbl )A

会给你

  sno sname mark  
1 A 9
2 A 10
1 B 10
2 B 40
1 C 10
1 D 15

如果你想给每一行编号,那么完全删除分区

 选择 row_number() over  order   by  sname,marks) as  sno,sname,marks 
FROM select distinct sname,标记来自 @ tbl )A

< pre lang =text> sno sname mark
1 A 9
2 A 10
3 B 10
4 B 40
5 C 10
6 D 15





------------------ OP OPMENT后编辑---------------



要获得重复的行,有两种方法可以尝试:



1.要获取sname和标记以及重复次数,请尝试使用

 选择 sname,marks,max(sno)作为 numberOfDups 来自 
(< span class =code-keyword>选择 row_number()< span class =code-keyword> over ( partition by sname,marks order by sname,marks) as sno,sname ,标记
FROM @ tbl )A
其中 sno> 1
group by sname ,标记





2.要获得具有重复项的完整项目列表,请尝试以下操作:

; < span class =code-keyword> with  cte  as  

select row_number() over partition by sname,mark order by sname,marks) as sno,sname,标记
来自 @ tbl

选择 *
来自 cte
inner join 选择 sname,marks,max (sno) as numberOfDup 来自 cte 其中 sno> 1 group by sname,marks)B cte.sname = B.sname AND cte.marks = B.marks

关键是在第二次查看CTE的子查询上的INNER JOIN。这不仅仅是一个简单的给我行sno> 1 - 连接确保虽然我们没有看到非重复的行,但我们所有有重复的行 - 即A 10,其中sno - 1 AND 2,而不仅仅是2



------------ ------在另一个OP评论之后编辑---------------



介绍必须采用的其他专栏帐户 - 连接需要很长时间才能开启条件



测试数据:

 声明  @ tbl   table (id  int   identity  1  1 ) ,sname  varchar  10 ),标记 int ,其他 int 
ins ert 进入 @ tbl ' A' 10 1 ),(' B' 10 1 ),(' < span class =code-string> A', 10 1 ),( ' B' 40 1 ),(' C' 10 2 ),(' D' 15 2 ),(' A' 9 1 ),(' A' 10 2

我的原始解决方案将返回

 1 A  10 2  3 
2 A 10 1 3
3 A 10 1 3

考虑到其他专栏。



我们可以使用校验和 [ ^ ]组合所有相关的列,例如

;   cte1  as  

< span class =code-keyword>选择校验和(sname,marks,other)< span class =code-keyword> as sno,sname,marks,other from @ tbl

选择 sname,标记,其他
来自 cte1
内部 join select 校验和(sname,marks,other) as sno 来自 @tbl group by 校验和(sname,marks,other) count(*)> 1 cte1.sno = A.sno

结果

 A 10 1 
A 10 1


声明@tbl表(id int identity(1,1) ,sname varchar(10),标记int)
插入@tbl值('A',10),('B',10),('A',10),('B',40) ,('C',10),('D',15),('A',9)
select * from(
select row_number()over(sname分区,标记按标记排序) )Sno,sname,来自@tbl的标记),其中sno = 1







声明@tbl表(id int identity(1,1),sname varchar(10),标记int)
插入@tbl值('A',10),('B',10) ,('A',10),('B',40),('C',10),('D',15),('A',9)

; with cte as

select row_number()over(sname分区,标记按标记排序)sno,sname,来自@tbl的标记)
select * from cte where sno = 1


Hi,

I have a query below

declare @tbl table (id int identity(1,1),sname varchar(10),marks int)
insert into @tbl values('A',10),('B',10),('A',10),('B',40),('C',10),('D',15),('A',9)
select row_number()over(partition by sname,marks order by marks)sno,sname,marks from @tbl

my result is:
sno sname marks
1 A 9
1 A 10
2 A 10
1 B 10
1 B 40
1 C 10
1 D 15

But I want to only second and third rows as these are duplicate records. i want all duplicate records irrespective of repeatation.

Please suggest

Thanks

What I have tried:

duplicates records elimination

解决方案

Given that sno is always going to be 1 (because you are partitioning by sname AND marks) you could simplify this to

select distinct 1 as sno,sname, marks from @tbl

The key point being that it is your introduction of the row_number() that is actually causing the issue.

But I suspect you want sno to be something meaningful (correct me if I'm wrong)

If you want it to show the number of the mark per name then use

select row_number() over (partition by sname order by sname,marks) as sno, sname, marks
FROM (select distinct 1 as sno,sname, marks from @tbl) A

which will give you

sno	sname	mark
1	A	9
2	A	10
1	B	10
2	B	40
1	C	10
1	D	15

If you want to give each row a number then remove the partition altogether

select row_number() over (order by sname,marks) as sno, sname, marks
FROM (select distinct sname, marks from @tbl) A

sno	sname	mark
1	A	9
2	A	10
3	B	10
4	B	40
5	C	10
6	D	15



------------------ EDIT AFTER OP COMMENT ---------------

To only get the lines that are duplicated there are two ways you could try:

1. To get the sname and marks and the number of duplications try this

select sname, marks, max(sno) as numberOfDups from 
(select row_number() over (partition by sname, marks order by sname,marks) as sno, sname, marks
FROM  @tbl) A
where sno > 1
group by sname, marks



2. To get the full list of items that have duplicates try this:

;with cte as
(
	select row_number() over (partition by sname, marks order by sname,marks) as sno, sname, marks
	from @tbl
)
select * 
from cte 
inner join (select sname, marks, max(sno) as numberOfDup from cte where sno > 1 group by sname, marks) B on cte.sname = B.sname AND cte.marks = B.marks

The key is the INNER JOIN on a sub-query looking at the CTE a second time. It's not just a simple "give me the rows where sno > 1" - the join ensures that while we don't see the non-duplicated rows we do see all the rows where there has been a duplicate - i.e. A 10 where sno - 1 AND 2, not just 2

------------------ EDIT AFTER ANOTHER OP COMMENT ---------------

Introducing other columns which must be taken into account - the joins take "much time for the on conditions

Test data:

declare @tbl table (id int identity(1,1),sname varchar(10),marks int, other int)
insert into @tbl values('A',10,1),('B',10,1),('A',10,1),('B',40,1),('C',10,2),('D',15,2),('A',9,1), ('A',10,2)

My original solution would return

1	A	10	2	3
2	A	10	1	3
3	A	10	1	3

Which is incorrect when taking into account the other column.

We can use Checksum [^] to "combine" all of the columns that are relevent e.g.

;with cte1 as 
(
	select checksum(sname, marks, other) as sno, sname, marks, other from @tbl
)
select sname, marks, other
from cte1
inner join (select checksum(sname, marks, other) as sno from @tbl group by checksum(sname, marks, other) having count(*) > 1) A on cte1.sno = A.sno

results

A	10	1
A	10	1


declare @tbl table (id int identity(1,1),sname varchar(10),marks int)
insert into @tbl values('A',10),('B',10),('A',10),('B',40),('C',10),('D',15),('A',9)
select * from (
select row_number()over(partition by sname,marks order by marks)sno,sname,marks from @tbl) as a where sno=1




declare @tbl table (id int identity(1,1),sname varchar(10),marks int)
insert into @tbl values('A',10),('B',10),('A',10),('B',40),('C',10),('D',15),('A',9)

;with cte as 
(
select row_number()over(partition by sname,marks order by marks)sno,sname,marks from @tbl)
select * from cte where sno=1


这篇关于选择重复记录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆