您如何使用 T-SQL 全文搜索获得像 Google 一样的结果? [英] How do you use T-SQL Full-Text Search to get results like Google?

查看:32
本文介绍了您如何使用 T-SQL 全文搜索获得像 Google 一样的结果?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据库,其中的字段可以包含长词组.我希望能够在这些列中快速搜索关键字或词组,但是在搜索词组时,我希望能够像 Google 一样搜索词组,返回包含所有指定词的所有行,但没有特别的彼此之间的秩序或接近".此时不需要按相关性对结果进行排名.

I have a database with fields that can contain long phrases of words. I wanted the ability to quickly search for a keyword or phrase in these columns, but when searching a phrase, I want to be able to search the phrase like Google would, returning all rows that contain all of the specified words, but in no particular order or "nearness" to each other. Ranking the results by relevance is unnecessary at this point.

在阅读了 SQL Server 的全文搜索后,我想这正是我所需要的:基于文本列中每个单词的可搜索索引.我的最终目标是安全地接受用户输入并将其转换为利用全文搜索速度的查询,同时保持用户的易用性.

After reading about SQL Server's Full-Text Search, I thought it would be just what I needed: a searchable index based on each word in a text-based column. My end goal is to safely accept user input and turn it into a query that leverages the speed of Full-Text Search, while maintaining ease-of-use for the users.

我看到了 FREETEXT 函数 可以使用整个短语,将其分解为有用"的词(忽略像 'and'、'or'、'the' 等词),然后非常快速地返回匹配行的列表,即使是复杂的搜索词.但是当您尝试使用它时,您可能会注意到,它似乎只执行 OR 搜索,而不是对每个术语进行 AND 搜索.也许有一种方法可以改变它的行为,但我没有发现任何有用的东西.

I see the FREETEXT function can take an entire phrase, break it up into "useful" words (ignoring words like 'and', 'or', 'the', etc), and then return a list of matching rows very quickly, even with a complex search term. But when you try to use it, you may notice that instead of an AND search for each of the terms, it seems to only do an OR search. Maybe there's a way to change its behavior, but I haven't found anything useful.

然后是 CONTAINS,它可以接受布尔查询短语,但有时会产生奇怪的结果.

Then there's CONTAINS, which can accept a boolean query phrase, but sometimes with odd results.

看看这个表上的以下查询:

Take a look at the following queries on this table:

PKID    Name
-----   -----
1       James Kirk
2       James Cameron
3       Kirk Cameron
4       Kirk For Cameron

查询

Q1: SELECT Name FROM tblName WHERE FREETEXT(Name, 'james')
Q2: SELECT Name FROM tblName WHERE FREETEXT(Name, 'james kirk')
Q3: SELECT Name FROM tblName WHERE FREETEXT(Name, 'kirk for cameron')
Q4: SELECT Name FROM tblName WHERE CONTAINS(Name, 'james')
Q5: SELECT Name FROM tblName WHERE CONTAINS(Name, '"james kirk"')
Q6: SELECT Name FROM tblName WHERE CONTAINS(Name, '"kirk james"')
Q7: SELECT Name FROM tblName WHERE CONTAINS(Name, 'james AND kirk')
Q8: SELECT Name FROM tblName WHERE CONTAINS(Name, 'kirk AND for AND cameron')

查询 1:

SELECT Name FROM tblName WHERE FREETEXT(Name, 'james')

返回James Kirk"和James Cameron".好吧,让我们缩小范围...

Returns "James Kirk" and "James Cameron". Alright, lets narrow it down...

SELECT Name FROM tblName WHERE FREETEXT(Name, 'james kirk')

你猜怎么着.现在您将获得James Kirk"、James Cameron"和Kirk For Cameron".查询 3 也会发生同样的事情,所以让我们跳过它.

Guess what. Now you'll get "James Kirk", "James Cameron", and "Kirk For Cameron". Same thing happens for Query 3, so let's just skip that.

SELECT Name FROM tblName WHERE CONTAINS(Name, 'james')

结果与查询 1 相同.好的.缩小结果可能...?

Same results as Query 1. Okay. Narrow the results maybe...?

SELECT Name FROM tblName WHERE CONTAINS(Name, '"james kirk"')

在发现如果有空格你需要用双引号将字符串括起来后,我发现这个查询在这个特定的数据集上非常有效,可以得到我想要的结果!只返回James Kirk".精彩的!还是...

After discovering that you need to enclose the string in double-quotes if there are spaces, I find that this query works great on this particular dataset for the results I desire! Only "James Kirk" is returned. Wonderful! Or is it...

SELECT Name FROM tblName WHERE CONTAINS(Name, '"kirk james"')

废话.不,它与那个确切的短语相匹配.嗯...在检查 T-SQL 的 CONTAINS 函数的语法后,我看到你可以在那里抛出布尔关键字,看起来这可能就是答案.让我们看看...

Crap. No. It is matching that exact phrase. Hmmm... After checking the syntax for T-SQL's CONTAINS function, I see that you can throw boolean keywords in there, and it looks like that might be the answer. Let's see...

SELECT Name FROM tblName WHERE CONTAINS(Name, 'james AND kirk')

整洁.正如预期的那样,我得到了所有三个结果.现在我只是编写一个函数来在所有单词之间填充单词 AND.完成了,对吧?现在怎么办...

Neat. I get all three results, as expected. Now I just write a function to cram the word AND between all the words. Done, right? What now...

SELECT Name FROM tblName WHERE CONTAINS(Name, 'kirk AND for AND cameron')

这个查询确切地知道它在寻找什么,除了某种原因,没有结果.为什么?阅读完停用词和停用词列表后,我会做出有根据的猜测并这么说是因为我要求kirk"、for"和cameron"的索引结果的交集,而for"这个词不会有任何结果(它是一个停用词等等),那么与该结果的任何交集的结果也是空的.它是否真的像那样运行与我无关,因为这是 CONTAINS 函数的可观察行为,每次我在其中使用停用词进行布尔搜索.

This query knows exactly what it's looking for, except for some reason, there are no results. Why? Well after reading about Stopwords and Stoplists, I will make an educated guess and say that because I'm asking for the intersection of the index results for "kirk", "for", and "cameron", and the word "for" will not have any results (what with it being a stopword and all), then the result of any intersection with that result is also empty. Whether or not it actually functions like that is irrelevant to me, since that is the observable behavior of the CONTAINS function every time I do a boolean search with a stopword in there.

所以我需要一个新的解决方案.

So I need a new solution.

看起来很有希望.如果我可以接受用户查询并在它之间放置逗号,这将......等待这与在 CONTAINS 查询中使用布尔 AND 是一样的.但它是否正确地忽略了停用词?

Looks promising. If I can take a user query and put commas between it, this will... wait this is the same thing as using boolean AND in CONTAINS queries. But does it ignore stopwords correctly?

SELECT Name FROM tblName WHERE CONTAINS(Name, 'NEAR(kirk, for, cameron)')

没有.没有结果.去掉for"这个词,你会再次得到所有三个结果.:(

Nope. No results. Remove the word "for", and you get all three results again. :(

推荐答案

我发现了此处的另一个问题 解决了这个问题同样的话题.事实上,详细介绍该方法的帖子甚至标题为A类似 Google 的全文搜索".它使用一个名为 Irony 的开源库来解析用户输入的搜索字符串并将其转换为 FTS-兼容查询.

I found another question on here that deals with this same topic. In fact, the post detailing the method is even titled "A Google-like Full Text Search". It uses an open-source library called Irony to parse a user-entered search string and turn it into a FTS-compatible query.

这是最新版本的源代码 类似 Google 的全文搜索.

Here is the source code for the latest version of the Google-like Full-Text Search.

这篇关于您如何使用 T-SQL 全文搜索获得像 Google 一样的结果?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆