在Sitecore中停用词 [英] stop words in sitecore

查看:98
本文介绍了在Sitecore中停用词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们正在使用Lucene作为Sitecore的一部分进行文本搜索. 是否有任何方法可以在Sitecore搜索中忽略停用词(例如a,an,the ...)?

We are using Lucene for text search as part of sitecore. Is there any method to ignore stop words (like a,an,the...) in the sitecore search?

推荐答案

默认情况下,Sitecore使用Lucene标准分析器-Lucene.Net.Analysis.Standard.StandardAnalyzer.您可以看到这是在web.config文件的/configuration/sitecore/search/analyzer元素中定义的. StandardAnalyzer类的构造函数之一接受它将视为停用词的字符串数组.默认情况下,它使用停用词的硬编码列表,其中包括:

By default, Sitecore uses Lucene standard analyzer - Lucene.Net.Analysis.Standard.StandardAnalyzer. You can see this is defined in /configuration/sitecore/search/analyzer element of web.config file. One of the constructors of StandardAnalyzer class accepts the array of strings it will consider stop words. By default it uses the hardcoded list of stop words which include:

"a","an","and","are","as","at", "be","but","by","for","if","in", 成",是",它",不",不",的", 上",或",这样",那个",该", 他们的",然后",那里",这些", 他们",此",至",是",将", 有"

"a", "an", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"

如果您想覆盖此行为,我认为您应该继承StandardAnalyzer并覆盖其默认构造函数,以从其他来源而不是硬编码数组获取停用词.您有多种选择,甚至可以从文本文件中读取.不要忘记在web.config中用您的标准类代替.

If you'd like to override this behavior, I think you should inherit StandardAnalyzer and override its default constructor to take the stop words from another source instead of the hardcoded array. You have various options, even reading it from a text file. Don't forget to replace the standard class with yours in web.config.

有关更多详细信息,请参见StandardAnalyzer类的其他构造函数. .NET Reflector 是您的朋友.

See other constructors of StandardAnalyzer class for more details. .NET Reflector is your friend here.

这篇关于在Sitecore中停用词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆