将潜在的重复项分成不同的行 [英] Separate Potential Duplicates Into Different Rows

查看:23
本文介绍了将潜在的重复项分成不同的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试根据 SSN、姓氏和 DOB 中的最后 4 个来识别我的数据库中的潜在重复客户.我编写的存储过程确实识别了潜在的重复项,但它在一行中列出了它们 - 我试图出于报告原因将它们拆分为单独的行.

I am trying to identify potential duplicate customers in my database based on the last 4 of the SSN, last name and DOB. The stored procedure I have written does identify potential duplicates but it lists them in one row - I am trying to split into separate rows for reporting reasons.

我的 T-SQL 看起来像:

My T-SQL looks like:

DECLARE
@StartDate DATE = '1/1/2017',
@EndDate DATE = '3/1/2017';

SELECT  DENSE_RANK() OVER (ORDER BY c.socialSecurityNumber) AS [SSNRanking] ,
    ROW_NUMBER() OVER (PARTITION BY c.socialSecurityNumber ORDER BY c.socialSecurityNumber) AS [RowNumb] ,
    c.socialSecurityNumber AS [SSN], 
    c.id AS [CustomerID]  , 
    c.firstName AS [FirstName] ,    
    c.lastName AS [lastName] ,
    c.birthDate [birthdate] ,
    c.createDate AS [CreateDate] ,
    c2.socialSecurityNumber AS [DupSSN] ,
    c2.id AS [DupCustomerID] ,
    c2.firstName AS [DupFirstName] ,
    c2.lastName AS [DupLastName] ,
    c2.birthDate AS [DupBirthDate] ,
    c2.createDate AS [DupCreateDate]
FROM    dbo.Customers AS [c]
    INNER JOIN dbo.Customers AS [c2] ON ( SUBSTRING(c.socialSecurityNumber,6,4) = SUBSTRING(c2.socialSecurityNumber,6,4) AND c.birthDate = c2.birthDate AND c.lastName = c2.lastName AND c.id <> c2.id )
    LEFT JOIN dbo.CustomerAddresses AS [CA] ON c.id = CA.customerID             
    LEFT OUTER JOIN dbo.Common_Orders AS [co] ON co.customerID = c.id
WHERE
    c.customerStatusTypeID <> 'M'
    AND C2.customerStatusTypeID <> 'M'
    AND c.mergedTo IS NULL
    AND c2.mergedTo IS NULL
    AND CAST(co.orderDate AS DATE) >= @StartDate
    AND CAST(co.orderDate AS DATE) <= @EndDate
    AND c.id = 1234439
GROUP BY c.socialSecurityNumber ,
    c.id ,
    c.firstName ,
    c.lastName ,
    c.birthDate ,       
    c.createDate ,
    c2.socialSecurityNumber ,
    c2.id ,
    c2.firstName ,        
    c2.lastName ,
    c2.birthDate ,
    c2.createDate
ORDER BY CAST(c.socialSecurityNumber AS INT) ASC;

我的数据集看起来像:

SSNRanking  RowNumb  SSN        CustomerID  FirstName  lastName  birthdate  CreateDate  DupSSN     DupCustomerID  DupFirstName  DupLastName  DupBirthDate  DupCreateDate
1           1        000009915  1234439     GREG       GARRETT  1900-01-01  2014-02-25  000009915  1166084        ADAM          GARRETT      1900-01-01    2013-08-29

在这个特定的例子中,我有两个用户的 SSN 的最后 4 个相同,姓氏和 DOB 相同 - 但名字不同.

In this particular instance, I have two users with the same last 4 of the SSN, same last name and same DOB - but different first names.

我怎样才能让这两条记录出现在不同的行上?理想情况下,我希望看到:

How can I make these two records appear on separate lines? Ideally, I would like to see:

SSNRanking  RowNumb  SSN        CustomerID  FirstName  lastName  birthdate   CreateDate
1           1        000009915  1234439     GREG       GARRETT   1900-01-01  2014-02-25
1           2        000009915  1166084     ADAM       GARRETT   1900-01-01  2013-08-29 

但我不确定在加入同一张桌子时如何实现这一点.建议?

But I am not sure how I can accomplish this when joining to the same table. Suggestions?

我正在链接到一个脚本,该脚本创建两个有问题的表并插入示例数据.希望这是可以接受的:SQL 脚本

I'm linking to a script that creates the two tables in question and inserts sample data. Hopefully that is acceptable: SQL Script

推荐答案

这称为逆透视".您可以使用 UNPIVOT 运算符,但我更喜欢使用 CROSS APPLY ... VALUES.

This is called "unpivot". You can use UNPIVOT operator, but I prefer to use CROSS APPLY ... VALUES.

我会将您的查询包装到 CTE 中而不查看详细信息,并使用 CROSS APPLY 将每一行拆分为两行.

I'll wrap your query into CTE without looking at it in details and split each row into two using CROSS APPLY.

DECLARE
@StartDate DATE = '1/1/2017',
@EndDate DATE = '3/1/2017';

WITH
CTE
AS
(
    SELECT  
        DENSE_RANK() OVER (ORDER BY c.socialSecurityNumber) AS [SSNRanking] ,
        ROW_NUMBER() OVER (PARTITION BY c.socialSecurityNumber ORDER BY c.socialSecurityNumber) AS [RowNumb] ,
        c.socialSecurityNumber AS [SSN], 
        c.id AS [CustomerID]  , 
        c.firstName AS [FirstName] ,    
        c.lastName AS [lastName] ,
        c.birthDate [birthdate] ,
        c.createDate AS [CreateDate] ,
        c2.socialSecurityNumber AS [DupSSN] ,
        c2.id AS [DupCustomerID] ,
        c2.firstName AS [DupFirstName] ,
        c2.lastName AS [DupLastName] ,
        c2.birthDate AS [DupBirthDate] ,
        c2.createDate AS [DupCreateDate]
    FROM    
        dbo.Customers AS [c]
        INNER JOIN dbo.Customers AS [c2] ON ( SUBSTRING(c.socialSecurityNumber,6,4) = SUBSTRING(c2.socialSecurityNumber,6,4) AND c.birthDate = c2.birthDate AND c.lastName = c2.lastName AND c.id <> c2.id )
        LEFT JOIN dbo.CustomerAddresses AS [CA] ON c.id = CA.customerID             
        LEFT JOIN dbo.Common_Orders AS [co] ON co.customerID = c.id
    WHERE
        c.customerStatusTypeID <> 'M'
        AND C2.customerStatusTypeID <> 'M'
        AND c.mergedTo IS NULL
        AND c2.mergedTo IS NULL
        AND CAST(co.orderDate AS DATE) >= @StartDate
        AND CAST(co.orderDate AS DATE) <= @EndDate
        AND c.id = 1234439
    GROUP BY
        c.socialSecurityNumber ,
        c.id ,
        c.firstName ,
        c.lastName ,
        c.birthDate ,       
        c.createDate ,
        c2.socialSecurityNumber ,
        c2.id ,
        c2.firstName ,        
        c2.lastName ,
        c2.birthDate ,
        c2.createDate
)
SELECT
    CA.SSNRanking
    ,CA.RowNumb
    ,CA.SSN
    ,CA.CustomerID
    ,CA.FirstName
    ,CA.lastName
    ,CA.birthdate
    ,CA.CreateDate
FROM
    CTE
    CROSS APPLY
    (
        VALUES
        (CTE.SSNRanking, CTE.RowNumb, CTE.SSN, CTE.CustomerID, CTE.FirstName, CTE.lastName, CTE.birthdate, CTE.CreateDate),
        (CTE.SSNRanking, CTE.RowNumb, CTE.DupSSN, CTE.DupCustomerID, CTE.DupFirstName, CTE.DuplastName, CTE.Dupbirthdate, CTE.DupCreateDate)
    ) AS CA(SSNRanking, RowNumb, SSN, CustomerID, FirstName, lastName, birthdate, CreateDate)
ORDER BY CAST(CA.SSN AS INT) ASC;

顺便说一句,

ROW_NUMBER() OVER (PARTITION BY ColumnA ORDER BY ColumnA)

当您按同一列分区和排序时没有意义.我不确定你想在那里实现什么.

doesn't make sense when you partition and order by the same column. I'm not sure what you want to achieve there.

这篇关于将潜在的重复项分成不同的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆