SQL Server中的单词匹配 [英] Word matching in SQL Server

查看:102
本文介绍了SQL Server中的单词匹配的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要在两个数据库表中的数据之间提供建议的匹配.基本要求是; -对于所讨论的两列之间的最大匹配单词数(不考虑顺序),建议使用匹配".

I have a requirement to provide a suggested match between data in two database tables. The basic requirement is; - A "match" should be suggested for the highest number of matched words (irrespective of order) between the two columns in question.

例如,给定数据;

   Table A                           Table B
1,'What other text in here'      5,'Other text in here'
2,'What am I doing here'         6,'I am doing what here'
3,'I need to find another job'   7,'Purple unicorns'
4,'Other text in here'           8,'What are you doing in here'

Ideally, my desired matches would look as follows;
1 -> 8   (3 words matched)
2 -> 6   (5 words matched)
3 -> Nothing
4 -> 5   (4 words matched)

我找到了字数统计功能看起来很有希望,但是我想不起来如何在SQL语句中使用它,这将给我我想要的匹配.另外,链接函数不是我所需要的,因为它使用了charindex,我认为它会在一个单词中搜索一个单词(即"in"将匹配"bin").

I've found word count functions that look promising, but I can't think of how to use it in a SQL statement, that will give me my desired match. Also, the linked function isn't quite what I need as it uses charindex which I think searches for a word within a word (ie. 'in' will match 'bin').

有人可以帮我吗?

谢谢.

推荐答案

我在下面使用sys.dm_fts_parser将句子拆分为单词.有大量的TSQL拆分如果您不在SQL Server 2008上,或者由于某种原因而不合适,则可以使用.

I've used sys.dm_fts_parser below to split the sentences into words. There are plenty of TSQL split functions around if you are not on SQL Server 2008 or find this isn't suitable for some reason.

每个A.id只能与以前未使用过的B.id配对的要求,反之亦然,这不是我想到的一种有效的基于集合的解决方案.

The requirement that each A.id can only be paired with a B.id that hadn't been used previously and vice-versa isn't one I could think of an efficient set based solution for.

;WITH A(Id, sentence) As
(
  SELECT 1,'What other text in here'    UNION ALL
  SELECT 2,'What am I doing here'       UNION ALL 
  SELECT 3,'I need to find another job' UNION ALL 
  SELECT 4,'Other text in here'         
),
B(Id, sentence) As
(
 SELECT  5,'Other text in here'          UNION ALL
 SELECT  6,'I am doing what here'        UNION ALL
 SELECT  7,'Purple unicorns'             UNION ALL
 SELECT  8,'What are you doing in here'
),  A_Split
     AS (SELECT Id AS A_Id,
                display_term,
                COUNT(*) OVER (PARTITION BY Id) AS A_Cnt
         FROM   A
                CROSS APPLY 
                   sys.dm_fts_parser('"' + REPLACE(sentence, '"', '""')+'"',1033, 0,0)),

     B_Split
     AS (SELECT Id AS B_Id,
                display_term,
                COUNT(*) OVER (PARTITION BY Id) AS B_Cnt
         FROM   B
                CROSS APPLY 
                   sys.dm_fts_parser('"' + REPLACE(sentence, '"', '""')+'"',1033, 0,0)),
     Joined
     As (SELECT A_Id,
                B_Id,
                B_Cnt,
                Cnt = COUNT(*),
                CAST(COUNT(*) as FLOAT)/B_Cnt AS PctMatchBToA,
                CAST(COUNT(*) as FLOAT)/A_Cnt AS PctMatchAToB
         from   A_Split A
                JOIN B_Split B
                  ON A.display_term = B.display_term
         GROUP  BY A_Id,
                   B_Id,
                   B_Cnt,
                   A_Cnt)
SELECT IDENTITY(int, 1, 1) as id, *
INTO   #IntermediateResults
FROM   Joined
ORDER  BY PctMatchBToA DESC,
          PctMatchAToB DESC

DECLARE @A_Id INT,
        @B_Id INT,
        @Cnt  INT

DECLARE @Results TABLE (
  A_Id INT,
  B_Id INT,
  Cnt  INT)

SELECT TOP(1) @A_Id = A_Id,
              @B_Id = B_Id,
              @Cnt = Cnt
FROM   #IntermediateResults
ORDER  BY id

WHILE ( @@ROWCOUNT > 0 )
  BEGIN

      INSERT INTO @Results
      SELECT @A_Id,
             @B_Id,
             @Cnt

      DELETE FROM #IntermediateResults
      WHERE  A_Id = @A_Id
              OR B_Id = @B_Id

      SELECT TOP(1) @A_Id = A_Id,
                    @B_Id = B_Id,
                    @Cnt = Cnt
      FROM   #IntermediateResults
      ORDER  BY id
  END

DROP TABLE #IntermediateResults

SELECT *
FROM   @Results
ORDER  BY A_Id  

返回

A_Id        B_Id        Cnt
----------- ----------- -----------
1           8           3
2           6           5
4           5           4

这篇关于SQL Server中的单词匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆