在SQL Server 2005全文索引中删除干扰词 [英] Dropping noise words in SQL Server 2005 full text indexing

查看:225
本文介绍了在SQL Server 2005全文索引中删除干扰词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在一个非常典型的场景中,我在我的Web应用程序上有一个搜索文本框,其中用户输入直接传递给存储过程,然后使用全文索引搜索两个表中的两个字段,适当的键。



我使用CONTAINS谓词来搜索字段。在传入搜索字符串之前,我执行以下操作:

  SET @ftQuery =''+ REPLACE(@query,' ','*'或'')+'*''

更改城堡转换为*或castle *,例如。这是必要的,因为我希望人们能够在 cas 上进行搜索并获得 castle 的结果。

  WHERE CONTAINS(Building.Name,@ftQuery)或CONTAINS(Road.Name,@ftQuery)

现在的问题是,现在我在每个单词的末尾添加了通配符,噪音词(例如 )也附加了通配符,因此不再显示下降。这意味着搜索城堡会返回包含 等单词的项目。



更改OR AND是我的第一个想法,但是,如果在查询中使用噪音词,那么看起来只是返回没有匹配的结果。



我试图实现的一切就是允许用户输入多个空格分隔的单词,它们以任何顺序表示它们正在搜索的单词的全部或前缀 - 并从它们的输入中删除诸如 的噪音词(否则当它们搜索时对于城堡,他们得到了一大串物品,结果他们需要列表中的某个地方。



我可以继续实施我自己的噪音词去除程序,但它似乎是全文索引应该能够处理的。



感谢您的帮助!



Jamie

解决方案

在索引编制之前, tored。因此,写一个查询停止词的查询是不可能的。如果您真的想启用此行为,则需要编辑停用词的列表。 ( http://msdn.microsoft.com/en-us/library/ms142551。 aspx ),然后重新构建您的索引。


In a pretty typical scenario, I have a 'Search' text box on my web application which has user input passed directly to a stored procedure which then uses full text indexing to search on two fields in two tables, which are joined using appropriate keys.

I am using the CONTAINS predicate to search the fields. Before passing the search string in, I do the following:

SET @ftQuery = '"' + REPLACE(@query,' ', '*" OR "') + '*"'

Changing the castle to "the*" OR "castle*", for example. This is necessary because I want people to be able to search on cas and get results for castle.

WHERE CONTAINS(Building.Name, @ftQuery) OR CONTAINS(Road.Name, @ftQuery)

The problem is that now that I have appended a wildcard to the end of each word, noise words (e.g. the) also have a wildcard appended and therefore no longer appear to get dropped. This means that a search for the castle will return items with words such as theatre etc.

Changing OR to AND was my first thought, but that appears to simply return no matches if a noise word is then used in the query.

All I am trying to achieve is to allow the user to enter multiple, space separated words that respresent either the entirety or a prefix of the words they are searching on, in any order - and drop noise words such as the from their input (otherwise when they search for the castle they get a big list of items with the result they need somewhere in the middle of the list.

I could go ahead and implement my own noise word removal procedure, but it seems like something that full text indexing ought to be able to handle.

Grateful for any help!

Jamie

解决方案

Noise words are stripped out before the indexing is stored. So it is impossible to write a query that searches on a stop word. If you REALLY want to enable this behavior, you need to edit the list of stop words. (http://msdn.microsoft.com/en-us/library/ms142551.aspx) and then re-build your index.

这篇关于在SQL Server 2005全文索引中删除干扰词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆