Apache Lucene TokenStream合同违规 [英] Apache Lucene TokenStream contract violation
本文介绍了Apache Lucene TokenStream合同违规的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
使用Appache Lucene TokenStream删除停用词 导致错误:
Using Appache Lucene TokenStream to remove stopwords causes an error:
TokenStream contract violation: reset()/close() call missing, reset() called multiple times, or subclass does not call super.reset(). Please see Javadocs of TokenStream class for more information about the correct consuming workflow.
我使用以下代码:
public static String removeStopWords(String string) throws IOException {
TokenStream tokenStream = new StandardTokenizer(Version.LUCENE_47, new StringReader(string));
TokenFilter tokenFilter = new StandardFilter(Version.LUCENE_47, tokenStream);
TokenStream stopFilter = new StopFilter(Version.LUCENE_47, tokenFilter, StandardAnalyzer.STOP_WORDS_SET);
StringBuilder stringBuilder = new StringBuilder();
CharTermAttribute token = tokenStream.getAttribute(CharTermAttribute.class);
while(stopFilter.incrementToken()) {
if(stringBuilder.length() > 0 ) {
stringBuilder.append(" ");
}
stringBuilder.append(token.toString());
}
stopFilter.end();
stopFilter.close();
return stringBuilder.toString();
}
但是正如您所看到的,我从不调用reset()或close().
But as you can see i never call reset() or close().
那我为什么会收到此错误?
So why am i getting this error?
推荐答案
我从不调用reset()或close().
i never call reset() or close().
好吧,这是您的问题.如果您想阅读TokenStream
javadoc,则会发现以下内容:
Well, that is your problem. If you care to read TokenStream
javadoc, you would find the following:
新的
TokenStream
API的工作流程如下:
The workflow of the new
TokenStream
API is as follows:
-
TokenStream
/TokenFilter
的实例,它们向AttributeSource
添加属性或从中获取属性. - 消费者致电
TokenStream#reset()
- ...
- Instantiation of
TokenStream
/TokenFilter
s which add/get attributes to/from theAttributeSource
. - The consumer calls
TokenStream#reset()
- ...
我只需要将带有reset()
的一行添加到您的代码中就可以了.
I only had to add one line with reset()
to your code and it worked.
...
CharTermAttribute token = tokenStream.getAttribute(CharTermAttribute.class);
tokenStream.reset(); // I added this
while(stopFilter.incrementToken()) {
...
这篇关于Apache Lucene TokenStream合同违规的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文