用Lucene扼杀英语单词 [英] Stemming English words with Lucene
问题描述
我正在Java应用程序中处理一些英文文本,我需要阻止它们。
例如,从文字amenities / amenity我需要获得amenit。
I'm processing some English texts in a Java application, and I need to stem them. For example, from the text "amenities/amenity" I need to get "amenit".
该函数如下所示:
String stemTerm(String term){
...
}
我找到了Lucene分析仪,但它看起来太复杂了我需要的东西。
http://lucene.apache .org / java / 2_2_0 / api / org / apache / lucene / analysis / PorterStemFilter.html
I've found the Lucene Analyzer, but it looks way too complicated for what I need. http://lucene.apache.org/java/2_2_0/api/org/apache/lucene/analysis/PorterStemFilter.html
有没有办法用它来阻止没有词建立分析仪?我不了解分析师的所有业务......
Is there a way to use it to stem words without building an Analyzer? I don't understand all the Analyzer business...
编辑:我实际上需要一个词干+词典化。 Lucene可以这样做吗?
EDIT: I actually need a stemming + lemmatization. Can Lucene do this?
推荐答案
import org.apache.lucene.analysis.PorterStemmer;
...
String stemTerm (String term) {
PorterStemmer stemmer = new PorterStemmer();
return stemmer.stem(term);
}
参见这里了解更多详情。如果你想做什么,那么你应该使用这个而不是Lucene。
See here for more details. If stemming is all you want to do, then you should use this instead of Lucene.
编辑你应该小写 term
,然后再将其传递给 stem()
。
You should lowercase term
before passing it to stem()
.
这篇关于用Lucene扼杀英语单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!