Apache Lucene TokenStream合同违规 [英] Apache Lucene TokenStream contract violation

查看:105
本文介绍了Apache Lucene TokenStream合同违规的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用Appache Lucene TokenStream删除停用词 导致错误:

Using Appache Lucene TokenStream to remove stopwords causes an error:

TokenStream contract violation: reset()/close() call missing, reset() called multiple times, or subclass does not call super.reset(). Please see Javadocs of TokenStream class for more information about the correct consuming workflow.

我使用以下代码:

public static String removeStopWords(String string) throws IOException {
    TokenStream tokenStream = new StandardTokenizer(Version.LUCENE_47, new StringReader(string));
    TokenFilter tokenFilter = new StandardFilter(Version.LUCENE_47, tokenStream);
    TokenStream stopFilter = new StopFilter(Version.LUCENE_47, tokenFilter, StandardAnalyzer.STOP_WORDS_SET);
    StringBuilder stringBuilder = new StringBuilder();

    CharTermAttribute token = tokenStream.getAttribute(CharTermAttribute.class);

    while(stopFilter.incrementToken()) {
        if(stringBuilder.length() > 0 ) {
            stringBuilder.append(" ");
        }

        stringBuilder.append(token.toString());
    }

    stopFilter.end();
    stopFilter.close();

    return stringBuilder.toString();
}

但是正如您所看到的,我从不调用reset()或close().

But as you can see i never call reset() or close().

那我为什么会收到此错误?

So why am i getting this error?

推荐答案

我从不调用reset()或close().

i never call reset() or close().

好吧,这是您的问题.如果您想阅读TokenStream javadoc,则会发现以下内容:

Well, that is your problem. If you care to read TokenStream javadoc, you would find the following:

新的TokenStream API的工作流程如下:

The workflow of the new TokenStream API is as follows:

  1. TokenStream/TokenFilter的实例,它们向AttributeSource添加属性或从中获取属性.
  2. 消费者致电TokenStream#reset()
  3. ...
  1. Instantiation of TokenStream/TokenFilters which add/get attributes to/from the AttributeSource.
  2. The consumer calls TokenStream#reset()
  3. ...

我只需要将带有reset()的一行添加到您的代码中就可以了.

I only had to add one line with reset() to your code and it worked.

...    
CharTermAttribute token = tokenStream.getAttribute(CharTermAttribute.class);
tokenStream.reset();   // I added this 
while(stopFilter.incrementToken()) {
...

这篇关于Apache Lucene TokenStream合同违规的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆