如何从SQL Server全文查询中突出显示结果 [英] How to do hit-highlighting of results from a SQL Server full-text query

查看:139
本文介绍了如何从SQL Server全文查询中突出显示结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们有一个使用SQL Server 2008作为数据库的Web应用程序。我们的用户可以在数据库中的特定列上进行全文搜索。 SQL Server的全文功能似乎不支持命中突出显示。我们是否需要自己来构建它,或者是否有一些图书馆或知识来解决如何做到这一点?

顺便说一句,应用程序是用C#编写的,因此.Net解决方案将是理想的,但不是必需的,因为我们可以进行翻译。

解决方案

扩展伊斯梅尔的想法,这不是最终的解决方案,但我认为这是一个很好的开始。



首先,我们需要获取已使用全文引擎检索的单词列表:

 declare @SearchPattern nvarchar(1000)='FORMSOF(INFLECTIONAL,''+ @SearchString +')'
declare @SearchWords表(Word varchar(100),Expansion_type int)
插入@SearchWords
从sys.dm_fts_parser(@SearchPattern,1033,0,0)中选择不同的display_term,expansion_type

where special_term ='精确匹配'

已经有很多人可以扩展,例如搜索模式非常基本;也可能有更好的方法来过滤掉你不需要的单词,但它最少会给你一个词干单词列表等,这些单词将被全文搜索匹配。



获得您需要的结果后,您可以使用RegEx解析结果集(或者最好只有一个子集来加速它,尽管我还没有找到一个好办法来做到这一点)。为此,我只使用两个while循环和一堆临时表和变量:

  declare @FinalResults table 
while (从@PrelimResults中选择COUNT(*))> 0
begin
select top 1 @CurrID = [UID],@Text =来自@PrelimResults的文本
声明@TextLength int = LEN(@Text)
声明@IndexOfDot int = CHARINDEX('。',REVERSE(@Text),@TextLength - dbo.RegExIndexOf(@Text,'\b'+ @FirstSearchWord +'\b')+ 1)
set @Text = SUBSTRING( @Text,case @IndexOfDot当0然后0 else @TextLength - @IndexOfDot + 3结束,300)

while(从@TempSearchWords中选择COUNT(*))> 0
begin
select top 1 @CurrWord = Word from @TempSearchWords
set @Text = dbo.RegExReplace(@Text,'\b'+ @CurrWord +'\b', '< b> + SUBSTRING(@Text,dbo.RegExIndexOf(@Text,'\b'+ @CurrWord +'\b'),LEN(@CurrWord)+ 1)+'< / b> ')
从@TempSearchWords中删除其中Word = @CurrWord
结尾

插入@FinalResults
select * from @PrelimResults其中[UID] = @CurrID
从@PrelimResults删除,其中[UID] = @CurrID
结束

几个注释:

1.嵌套while循环可能不是最有效的方法,但是没有其他的想法。如果我要使用游标,它基本上是一样的东西?

2. @FirstSearchWord 这里指的是文本中的第一个实例原始搜索词之一,所以基本上你正在替换的文本只会在摘要中。再次,这是一个非常基本的方法,某种文本群集查找算法可能会很方便。

3.要获得RegEx首先,您需要CLR用户定义函数。


We have a web application that uses SQL Server 2008 as the database. Our users are able to do full-text searches on particular columns in the database. SQL Server's full-text functionality does not seem to provide support for hit highlighting. Do we need to build this ourselves or is there perhaps some library or knowledge around on how to do this?

BTW the application is written in C# so a .Net solution would be ideal but not necessary as we could translate.

解决方案

Expanding on Ishmael's idea, it's not the final solution, but I think it's a good way to start.

Firstly we need to get the list of words that have been retrieved with the full-text engine:

declare @SearchPattern nvarchar(1000) = 'FORMSOF (INFLECTIONAL, " ' + @SearchString + ' ")' 
declare @SearchWords table (Word varchar(100), Expansion_type int)
insert into @SearchWords
select distinct display_term, expansion_type
from sys.dm_fts_parser(@SearchPattern, 1033, 0, 0)
where special_term = 'Exact Match'

There is already quite a lot one can expand on, for example the search pattern is quite basic; also there are probably better ways to filter out the words you don't need, but it least it gives you a list of stem words etc. that would be matched by full-text search.

After you get the results you need, you can use RegEx to parse through the result set (or preferably only a subset to speed it up, although I haven't yet figured out a good way to do so). For this I simply use two while loops and a bunch of temporary table and variables:

declare @FinalResults table 
while (select COUNT(*) from @PrelimResults) > 0
begin
    select top 1 @CurrID = [UID], @Text = Text from @PrelimResults
    declare @TextLength int = LEN(@Text )
    declare @IndexOfDot int = CHARINDEX('.', REVERSE(@Text ), @TextLength - dbo.RegExIndexOf(@Text, '\b' + @FirstSearchWord + '\b') + 1)
    set @Text = SUBSTRING(@Text, case @IndexOfDot when 0 then 0 else @TextLength - @IndexOfDot + 3 end, 300)

    while (select COUNT(*) from @TempSearchWords) > 0
    begin
        select top 1 @CurrWord = Word from @TempSearchWords
        set @Text = dbo.RegExReplace(@Text, '\b' + @CurrWord + '\b',  '<b>' + SUBSTRING(@Text, dbo.RegExIndexOf(@Text, '\b' + @CurrWord + '\b'), LEN(@CurrWord) + 1) + '</b>')
        delete from @TempSearchWords where Word = @CurrWord
    end

    insert into @FinalResults
    select * from @PrelimResults where [UID] = @CurrID
    delete from @PrelimResults where [UID] = @CurrID
end

Several notes:
1. Nested while loops probably aren't the most efficient way of doing it, however nothing else comes to mind. If I were to use cursors, it would essentially be the same thing?
2. @FirstSearchWord here to refers to the first instance in the text of one of the original search words, so essentially the text you are replacing is only going to be in the summary. Again, it's quite a basic method, some sort of text cluster finding algorithm would probably be handy.
3. To get RegEx in the first place, you need CLR user-defined functions.

这篇关于如何从SQL Server全文查询中突出显示结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆