Sql Server 2005全文搜索中的噪音词 [英] Noise Words in Sql Server 2005 Full Text Search

查看:129
本文介绍了Sql Server 2005全文搜索中的噪音词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图在数据库中对一系列名称使用全文搜索。这是我第一次尝试使用全文搜索。目前,我将搜索字符串输入并在每个术语之间放置一个NEAR条件(即输入的Leon of Kings的短语变成Leon附近的国王)。

不幸的是,我发现这种策略导致了一个错误的否定搜索结果,因为当它创建索引时,SQL Server将丢弃of这个词,因为它是一个噪音词。因此,国王莱昂将正确匹配,但国王莱昂不会。



我的同事建议采取MSSQL中定义的所有噪音词\\ FTData \\\
oiseENG.txt并将它们放入.Net代码中,以便在执行全文搜索前可以去除噪音词。



这是最好的解?有没有一些自动魔术设置,我可以在SQL Server中更改为我做这个?或者,也许只是一个更好的解决方案,不会觉得自己像个黑客一样? 全文将在搜索条件下运行你提供它。您可以从文件中删除干扰词,但这样做会使您的索引大小膨胀。罗伯特凯恩在他的博客上有很多关于这方面的好消息:

http://arcanecode.com/2008/05/29/creating-and-customizing-为了节省一些时间,你可以看看这种方法如何消除它们并复制代码和单词:

pre $ code public string PrepSearchString @1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 0 | $ |!| @ |#| $ |%| ^ |& | * |(|)| - | _ | | | | | | | | | | | | | |关于|之后|所有|也|一个|和|另一个|任何|都|作为| at | be |因为|之前|之前|之间| |来|可以|可以|可以|可以|可以|可以|可以|可以|可以|可以|可以|可以|可以|可以|可以|可以|可以|可以|可以|可以|可以|如果|在|中进入|是|它|其| |只是|像|一样make |很多|我| |可能|更多|大多数|很多|必须|我的|从来没有|现在|的|在|上只有|或|其他|我们的| out |在|上re |说|相同|见|应该|因为|所以|一些|仍然|这样的|采取|比|更好那| | |他们|他们|那么|有|这些|他们|这| |那些|通过|到|太| |在|下up |使用|非常|想要|是|方式|我们|好| | |什么|当|时在哪里|其中|而|谁|将| |用|会| |你|你的| a | b | c | d | e | f | g | h |我| j | k | l | m | n | o | p | q | r | s | t | u | v | w | x | y | z;

string [] arrNoiseWord = strNoiseWords.Split(|.ToCharArray());

foreach(arrNoiseWord中的字符串noiseword)
{$
sOriginalQuery = sOriginalQuery.Replace(,);
return sOriginalQuery.Trim();

$ / code>

然而,我可能会用一个Regex.Replace来这个应该快得多比循环,我只是没有一个快速的例子发布。


I am attempting to use a full text search over a series of names in my database. This is my first attempt at using full text search. Currently I take the search string entered and put a NEAR condition between each term (i.e. entered phrase of "Kings of Leon" becomes "Kings NEAR of NEAR Leon").

Unfortunately I have discovered that this tactic results in a false negative search result because the word "of" is being dropped by SQL Server when it creates the indexes because it is a noise word. Thus, "Kings Leon" will match correctly, but "Kings of Leon" will not.

My co-worker suggests taking all the noise words as defined in the MSSQL\FTData\noiseENG.txt and putting them in the .Net code so the noise words can be stripped out before the full text search is executed.

Is this the best solution? Is there not some auto-magic setting I can change in SQL server to do this for me? Or maybe just a better solution that doesn't feel as hacky?

解决方案

Full Text is going to work off of the search criteria you provide it. You can remove the noise word from the file, but you really risk bloating your index size by doing that. Robert Cain has a lot of good information on his blog regarding this:

http://arcanecode.com/2008/05/29/creating-and-customizing-noise-words-in-sql-server-2005-full-text-search/

To save some time you can look at how this method removes them and copy the code and words:

        public string PrepSearchString(string sOriginalQuery)
    {
        string strNoiseWords = @" 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 0 | $ | ! | @ | # | $ | % | ^ | & | * | ( | ) | - | _ | + | = | [ | ] | { | } | about | after | all | also | an | and | another | any | are | as | at | be | because | been | before | being | between | both | but | by | came | can | come | could | did | do | does | each | else | for | from | get | got | has | had | he | have | her | here | him | himself | his | how | if | in | into | is | it | its | just | like | make | many | me | might | more | most | much | must | my | never | now | of | on | only | or | other | our | out | over | re | said | same | see | should | since | so | some | still | such | take | than | that | the | their | them | then | there | these | they | this | those | through | to | too | under | up | use | very | want | was | way | we | well | were | what | when | where | which | while | who | will | with | would | you | your | a | b | c | d | e | f | g | h | i | j | k | l | m | n | o | p | q | r | s | t | u | v | w | x | y | z ";

        string[] arrNoiseWord = strNoiseWords.Split("|".ToCharArray());

        foreach (string noiseword in arrNoiseWord)
        {
            sOriginalQuery = sOriginalQuery.Replace(noiseword, " ");
        }
        sOriginalQuery = sOriginalQuery.Replace("  ", " ");
        return sOriginalQuery.Trim();
    }

however, I would probably go with a Regex.Replace for this which should be much faster than looping. I just don't have a quick example to post.

这篇关于Sql Server 2005全文搜索中的噪音词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆