如何在SQL Server中选择顺序重复项 [英] How to select sequential duplicates in SQL Server

查看:81
本文介绍了如何在SQL Server中选择顺序重复项的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想从SQL Server表中选择重复的条目,但仅当id连续时.

I would like to select duplicate entries from a SQL Server table, but only if the id is consecutive.

我一直在尝试根据自己的需要修改此答案,但我无法使其正常工作.

I have been trying to twist this answer to my needs, but I can't get it to work.

上面的答案是针对Oracle的,但是我看到SQL Server也具有leadlag函数.

The above answer is for Oracle, but I see that SQL Server also has lead and lag functions.

此外,我认为上面的答案在重复项旁边放置*,但是我只想选择重复项.

Also, I think that the answer above puts a * next to duplicates, but I only want to select the duplicates.

select 
    id, companyName, 
    case 
       when companyName in (prev, next) 
          then '*' 
    end match, 
    prev, 
    next 
from 
    (select
         id,
         companyName,
         lag(companyName, 1) over (order by id) prev,
         lead(companyName, 1) over (order by id) next
     from
         companies)
order by
    id;

示例:

因此,从该数据集中:

id      companyName
-------------------    
1       dogs ltd
2       cats ltd
3       pigs ltd
4       pigs ltd
5       cats ltd
6       cats ltd
7       dogs ltd
8       pigs ltd

我要选择:

id      companyName
-------------------    
3       pigs ltd
4       pigs ltd
5       cats ltd
6       cats ltd

更新

我时不时地对我得到的答案的数量和质量感到惊讶.这是那些时代之一.我没有判断一个答案比另一个答案更好的专业知识,所以我选择了SqlZim,因为这是我看到的第一个可行答案.但是很高兴看到不同的方法.尤其是仅在一个小时前,我想知道这是否可能?".

Update

Every now and again I am taken aback by the quantity and quality of answers I get on SO. This is one of those times. I don't have the level of expertise to judge one answer as being better than another, so I've gone for SqlZim as this was the first working answer I saw. But it's great to see the different approaches. Especially when only an hour ago I was wondering "is this even possible?".

推荐答案

这是一个间隙和孤岛样式的问题,但是我们在最里面的子查询中使用了idrow_number(),而不是使用两个row_numbers().后跟count() over()以获得每个grp的计数,最后返回带有cnt > 1的计数.

This is a gaps and islands style problem, but instead of using two row_numbers(), we use the id and row_number() in the innermost subquery. Followed by count() over() to get the count per grp, and finally return those with a cnt > 1.

select id, companyname 
from (
  select 
      id
    , companyName
    , grp
    , cnt = count(*) over (partition by companyname, grp)
  from (
    select *
      , grp = id - row_number() over (partition by companyname order by id)
    from
      companies
    ) islands
  ) d
where cnt  > 1
order by id

rextester演示: http://rextester.com/ACP73683

rextester demo: http://rextester.com/ACP73683

返回:

+----+-------------+
| id | companyname |
+----+-------------+
|  3 | pigs ltd    |
|  4 | pigs ltd    |
|  5 | cats ltd    |
|  6 | cats ltd    |
+----+-------------+

这篇关于如何在SQL Server中选择顺序重复项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆