如何在 SQL Server 2005 中选择最接近的匹配项? [英] How can I choose the closest match in SQL Server 2005?

查看:29
本文介绍了如何在 SQL Server 2005 中选择最接近的匹配项?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在 SQL Server 2005 中,我有一个成功销售的输入表,以及各种包含已知客户信息及其详细信息的表.对于每一行销售,我需要匹配 0 或 1 个已知客户.

In SQL Server 2005, I have a table of input coming in of successful sales, and a variety of tables with information on known customers, and their details. For each row of sales, I need to match 0 or 1 known customers.

我们从销售表中获得以下信息:
服务标识,地址,邮政编码,电子邮件地址,家庭电话,名,姓氏

We have the following information coming in from the sales table:
ServiceId, Address, ZipCode, EmailAddress, HomePhone, FirstName, LastName

客户信息包括所有这些,以及LastTransaction"日期.

The customers information includes all of this, as well as a 'LastTransaction' date.

这些字段中的任何一个都可以映射回 0 个或多个客户.每当销售表中的 ServiceId、Address+ZipCode、EmailAddress 或 HomePhone 与客户完全匹配时,我们都将匹配视为匹配.

Any of these fields can map back to 0 or more customers. We count a match as being any time that a ServiceId, Address+ZipCode, EmailAddress, or HomePhone in the sales table exactly matches a customer.

问题是我们有许多客户的信息,有时同一家庭中有多个客户.这意味着我们可能在同一个房子里有 John Doe、Jane Doe、Jim Doe 和 Bob Doe.它们都会在 Address+ZipCode 和 HomePhone 上匹配——而且可能不止一个匹配 ServiceId.

The problem is that we have information on many customers, sometimes multiple in the same household. This means that we might have John Doe, Jane Doe, Jim Doe, and Bob Doe in the same house. They would all match on on Address+ZipCode, and HomePhone--and possibly more than one of them would match on ServiceId, as well.

我需要某种方式在交易中优雅地跟踪客户的最佳"匹配.如果一个匹配 6 个字段,而其他只匹配 5 个,则该客户应保留为与该记录的匹配项.多次匹配5个,再没有匹配的情况下,保留最近的LastTransaction日期.

I need some way to elegantly keep track of, in a transaction, the 'best' match of a customer. If one matches 6 fields, and the others only match 5, that customer should be kept as a match to that record. In the case of multiple matching 5, and none matching more, the most recent LastTransaction date should be kept.

任何想法将不胜感激.

更新:更清楚一点,我正在寻找一种好方法来验证数据行中完全匹配的数量,并根据该信息选择要关联的行.如果姓氏为Doe",则必须与客户姓氏完全匹配,才能算作匹配参数,而不是非常接近的匹配.

Update: To be a little more clear, I am looking for a good way to verify the number of exact matches in the row of data, and choose which rows to associate based on that information. If the last name is 'Doe', it must exactly match the customer last name, to count as a matching parameter, rather than be a very close match.

推荐答案

对于 SQL Server 2005 及更高版本尝试:

for SQL Server 2005 and up try:

;WITH SalesScore AS (
SELECT
    s.PK_ID as S_PK
        ,c.PK_ID AS c_PK
        ,CASE 
             WHEN c.PK_ID IS NULL THEN 0
             ELSE CASE WHEN s.ServiceId=c.ServiceId THEN 1 ELSE 0 END
                  +CASE WHEN (s.Address=c.Address AND s.Zip=c.Zip) THEN 1 ELSE 0 END
                  +CASE WHEN s.EmailAddress=c.EmailAddress THEN 1 ELSE 0 END
                  +CASE WHEN s.HomePhone=c.HomePhone THEN 1 ELSE 0 END
         END AS Score
    FROM Sales s
        LEFT OUTER JOIN Customers c ON s.ServiceId=c.ServiceId
                                       OR (s.Address=c.Address AND s.Zip=c.Zip)
                                       OR s.EmailAddress=c.EmailAddress
                                       OR s.HomePhone=c.HomePhone 
)
SELECT 
    s.*,c.*
    FROM (SELECT
              S_PK,MAX(Score) AS Score
              FROM SalesScore 
              GROUP BY S_PK
         ) dt
        INNER JOIN Sales          s ON dt.s_PK=s.PK_ID 
        INNER JOIN SalesScore    ss ON dt.s_PK=s.PK_ID AND dt.Score=ss.Score
        LEFT OUTER JOIN Customers c ON ss.c_PK=c.PK_ID

编辑我讨厌在没有给出 shema 的情况下编写这么多实际代码,因为我实际上无法运行它并确保它有效.然而,要回答如何使用最后交易日期处理关系的问题,这里是上述代码的较新版本:

EDIT I hate to write so much actual code when there was no shema given, because I can't actually run this and be sure it works. However to answer the question of the how to handle ties using the last transaction date, here is a newer version of the above code:

;WITH SalesScore AS (
SELECT
    s.PK_ID as S_PK
        ,c.PK_ID AS c_PK
        ,CASE 
             WHEN c.PK_ID IS NULL THEN 0
             ELSE CASE WHEN s.ServiceId=c.ServiceId THEN 1 ELSE 0 END
                  +CASE WHEN (s.Address=c.Address AND s.Zip=c.Zip) THEN 1 ELSE 0 END
                  +CASE WHEN s.EmailAddress=c.EmailAddress THEN 1 ELSE 0 END
                  +CASE WHEN s.HomePhone=c.HomePhone THEN 1 ELSE 0 END
         END AS Score
    FROM Sales s
        LEFT OUTER JOIN Customers c ON s.ServiceId=c.ServiceId
                                       OR (s.Address=c.Address AND s.Zip=c.Zip)
                                       OR s.EmailAddress=c.EmailAddress
                                       OR s.HomePhone=c.HomePhone 
)
SELECT
    *
    FROM (SELECT 
              s.*,c.*,row_number() over(partition by s.PK_ID order by s.PK_ID ASC,c.LastTransaction DESC) AS RankValue
              FROM (SELECT
                        S_PK,MAX(Score) AS Score
                        FROM SalesScore 
                        GROUP BY S_PK
                   ) dt
                  INNER JOIN Sales          s ON dt.s_PK=s.PK_ID 
                  INNER JOIN SalesScore    ss ON dt.s_PK=s.PK_ID AND dt.Score=ss.Score
                  LEFT OUTER JOIN Customers c ON ss.c_PK=c.PK_ID
         ) dt2
    WHERE dt2.RankValue=1

这篇关于如何在 SQL Server 2005 中选择最接近的匹配项?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆