MS SQL 2008:删除“重复项";使用没有PK的多列选择最新结果 [英] MS SQL 2008 : Delete "duplicates" to select most recent results using multiple columns with no PK

查看:100
本文介绍了MS SQL 2008:删除“重复项";使用没有PK的多列选择最新结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个看起来像这样的表,我需要删除重复项以获取最新结果.这些不是您的标准重复项,因为没有主键或另一列,您没有在其中计算相同值的相同实例.该表列出了已注册球员的名单,以及他们加入和离开球队的日期.如果EndDate列为Null,则表示该球员仍在为该球队效力.

I have a table that looks like this and I need to remove duplicates to get the most recent results. These are not your standard duplicates as there is no primary key or another column where you are counting same instances of the same value. This table has a list of registered players, with the date they joined and left a team. If column EndDate is Null then it means the player is still playing for that team.

PlayerID | RegID | RegDate | EndDate | Team | LastUpdate |

1 ---------| 1 ------| 10/12/13  | 10/16/13 | Red -- | 10/16/13 -- |

1 ---------| 2 ------| 10/17/13  | null ------ | Blue -- | 10/23/13 -- |

1 ---------| 3 ------| 10/17/13  | null ------ | Green -- | 10/23/13 -- |

什么是重复项?如果玩家ID在EndDate列中有多个空记录.然后,我们只想检索带有空EnDate的记录,该记录最后在"LastUpdade"列上进行了更新,并且如果它们具有相同的LastUpated值,则采用RegistrationID列的最大值.

What is a duplicate? If the player ID has more than 1 null record in the EndDate column. Then we would want to only retrieve the record with the null EnDate which was updated last on "LastUpdade" column, and if they have the same LastUpated value then take the highest value of RegistrationID column.

这应该给我们显示以下行的结果:

This should give us a result showing the following rows:

在这种情况下,我们将检索第1行和第3行,因为第1行在EndDate中没有空值,而第3行因为LastUpdateON与第2行相同,但是其RegID高于第2行.

In this case then we would retrieve rows 1 and 3 since row 1 does not have a null in EndDate and row 3 since the LastUpdateON is the same as row 2 but its RegID is higher than that of row 2.

我尝试使用CTE和LastUpdated DESC和RegDate DESC的命令按分区进行分区,但是我没有得到正确的结果.

I have tried using a CTE and the Partition By command ordering by LastUpdated DESC and RegDate DESC but I am not getting the right results.

这是否可以使用CTE完成,如果可以的话,还是应该通过创建另一个表来完成?

Could this be done using a CTE, if so how, or should it be done by creating another table, and if so how?

非常感谢您能为我提供的任何帮助.保重!

Thank you very much for any help you can provide me. Take care!

推荐答案

您可以使用row_number()进行此操作:

You can do this with row_number():

select t.*
from (select t.*,
             row_number() over (partition by PlayerId, EndDate
                                order by lastupdate desc, registrationid desc
                               ) as seqnum
      from table t
     ) t
where EndDate is not NULL or seqnum = 1;

这将枚举组中的行.在这种情况下,该组由PlayerIdEndDate组合定义.因此,一个玩家的所有NULL值都在一个组中.第一个值是具有最高lastupdate日期的值,然后是具有最高注册ID的值.外部where接受具有有效EndDate或组中第一条记录的所有记录.

This enumerates the rows in a group. In this case, the group is defined by the PlayerId, EndDate combination. So, all the NULL values for a player are in one group. The first value is the one with the highest lastupdate date and then the highest registration id. The outer where takes all records that either have a valid EndDate or that are first in the group.

您的问题对于您是否只想返回这些值还是要实际删除其他值还是有点模棱两可.幸运的是,SQL Server具有可更新的CTE,因此您可以使用非常相似的逻辑从表中删除记录:

Your question is a little ambiguous on whether you just want to return these values or if you want to actually delete the others. Fortunately, SQL Server has updatable CTEs, so you can use very similar logic to delete the records from the table:

with todelete as (
      select t.*,
             row_number() over (partition by PlayerId, EndDate
                                order by lastupdate desc, registrationid desc
                               ) as seqnum
      from table t
     ) t
delete from todelete
    where EndDate is NULL and seqnum > 1;

这篇关于MS SQL 2008:删除“重复项";使用没有PK的多列选择最新结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆