lucene standardanalyzer是否会删除停用词并具有词干功能? [英] does lucene standardanalyzer remove stopwords and have stemming function?
问题描述
我已经使用indexWriter测试了standardanalyzer并发现它会自动删除停用词,但是,我没有添加停用词列表,因为下面的代码是我使用的
i have tested standardanalyzer with indexWriter and found that it automatically removes stopwords, however, i did not add stopwords list as following code is what i used
StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_35);
IndexWriterConfig config =new IndexWriterConfig(Version.LUCENE_35, analyzer);
哪里是默认止损名单?
也,这个分析器是否会自动干掉单词?
where is default stopwords list? also, does this analyzer automatically stem words too??
推荐答案
根据 API docs ,存在一个默认的一组停用词(取自英语),存储在 StandardAnalyzer.STOP_WORDS_SET
中。如果使用构造函数 public StandardAnalyzer(Version matchVersion)
创建分析器,则使用它,这正是您所做的。该集与 StopAnalyzer.ENGLISH_STOP_WORDS_SET
。你可以使用其他一个构造函数来传递分析器另一个(可能是空的)一组停用词。
According to the API docs, there exists a default set of stopwords (taken from English language), stored in StandardAnalyzer.STOP_WORDS_SET
. It is used if you create the analyzer with the constructor public StandardAnalyzer(Version matchVersion)
, which is exactly what you do. The set is exactly the same as StopAnalyzer.ENGLISH_STOP_WORDS_SET
. You can use one of the other constructors to pass the analyzer another (possibly empty) set of stopwords.
StandardAnalyzer
不会干话。如果您需要词干,请使用例如 SnowballAnalyzer
。
StandardAnalyzer
doesn't stem words. If you need stemming, use for example SnowballAnalyzer
.
这篇关于lucene standardanalyzer是否会删除停用词并具有词干功能?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!