lucene standardanalyzer是否会删除停用词并具有词干功能? [英] does lucene standardanalyzer remove stopwords and have stemming function?

查看:146
本文介绍了lucene standardanalyzer是否会删除停用词并具有词干功能?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经使用indexWriter测试了standardanalyzer并发现它会自动删除停用词,但是,我没有添加停用词列表,因为下面的代码是我使用的

i have tested standardanalyzer with indexWriter and found that it automatically removes stopwords, however, i did not add stopwords list as following code is what i used

StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_35); 
        IndexWriterConfig config =new IndexWriterConfig(Version.LUCENE_35, analyzer);

哪里是默认止损名单?
也,这个分析器是否会自动干掉单词?

where is default stopwords list? also, does this analyzer automatically stem words too??

推荐答案

根据 API docs ,存在一个默认的一组停用词(取自英语),存储在 StandardAnalyzer.STOP_WORDS_SET 中。如果使用构造函数 public StandardAnalyzer(Version matchVersion)创建分析器,则使用它,这正是您所做的。该集与 StopAnalyzer.ENGLISH_STOP_WORDS_SET 。你可以使用其他一个构造函数来传递分析器另一个(可能是空的)一组停用词。

According to the API docs, there exists a default set of stopwords (taken from English language), stored in StandardAnalyzer.STOP_WORDS_SET. It is used if you create the analyzer with the constructor public StandardAnalyzer(Version matchVersion), which is exactly what you do. The set is exactly the same as StopAnalyzer.ENGLISH_STOP_WORDS_SET. You can use one of the other constructors to pass the analyzer another (possibly empty) set of stopwords.

StandardAnalyzer 不会干话。如果您需要词干,请使用例如 SnowballAnalyzer

StandardAnalyzer doesn't stem words. If you need stemming, use for example SnowballAnalyzer.

这篇关于lucene standardanalyzer是否会删除停用词并具有词干功能?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆