在Sitecore中停用词 [英] stop words in sitecore
问题描述
我们正在使用Lucene作为Sitecore的一部分进行文本搜索. 是否有任何方法可以在Sitecore搜索中忽略停用词(例如a,an,the ...)?
We are using Lucene for text search as part of sitecore. Is there any method to ignore stop words (like a,an,the...) in the sitecore search?
推荐答案
默认情况下,Sitecore使用Lucene标准分析器-Lucene.Net.Analysis.Standard.StandardAnalyzer
.您可以看到这是在web.config文件的/configuration/sitecore/search/analyzer
元素中定义的. StandardAnalyzer
类的构造函数之一接受它将视为停用词的字符串数组.默认情况下,它使用停用词的硬编码列表,其中包括:
By default, Sitecore uses Lucene standard analyzer - Lucene.Net.Analysis.Standard.StandardAnalyzer
. You can see this is defined in /configuration/sitecore/search/analyzer
element of web.config file. One of the constructors of StandardAnalyzer
class accepts the array of strings it will consider stop words. By default it uses the hardcoded list of stop words which include:
"a","an","and","are","as","at", "be","but","by","for","if","in", 成",是",它",不",不",的", 上",或",这样",那个",该", 他们的",然后",那里",这些", 他们",此",至",是",将", 有"
"a", "an", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"
如果您想覆盖此行为,我认为您应该继承StandardAnalyzer
并覆盖其默认构造函数,以从其他来源而不是硬编码数组获取停用词.您有多种选择,甚至可以从文本文件中读取.不要忘记在web.config中用您的标准类代替.
If you'd like to override this behavior, I think you should inherit StandardAnalyzer
and override its default constructor to take the stop words from another source instead of the hardcoded array. You have various options, even reading it from a text file. Don't forget to replace the standard class with yours in web.config.
有关更多详细信息,请参见StandardAnalyzer
类的其他构造函数. .NET Reflector 是您的朋友.
See other constructors of StandardAnalyzer
class for more details. .NET Reflector is your friend here.
这篇关于在Sitecore中停用词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!