重新读取Lucene TokenStream时遇到问题 [英] Having trouble rereading a Lucene TokenStream

查看:97
本文介绍了重新读取Lucene TokenStream时遇到问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Lucene 4.6,并且由于如何处理TokenStream,显然不清楚如何重用TokenStream:

I am using Lucene 4.6, and am apparently unclear on how to reuse a TokenStream, because I get the exception:

java.lang.IllegalStateException: TokenStream contract violation: reset()/close() call missing, reset() called multiple times, or subclass does not call super.reset(). Please see Javadocs of TokenStream class for more information about the correct consuming workflow.

在第二遍开始时

.我已经阅读了Javadoc,但是仍然缺少一些东西.这是引发上述异常的简单示例:

at the start of the second pass. I've read the Javadoc, but I'm still missing something. Here is a simple example that throws the above exception:

@Test
public void list() throws Exception {
  String text = "here are some words";
  TokenStream ts = new StandardTokenizer(Version.LUCENE_46, new StringReader(text));
  listTokens(ts);
  listTokens(ts);
}

public static void listTokens(TokenStream ts) throws Exception {
  CharTermAttribute termAtt = ts.addAttribute(CharTermAttribute.class);
  try {
    ts.reset();
    while (ts.incrementToken()) {
      System.out.println("token text: " + termAtt.toString());
    }
    ts.end();
  }
  finally {
    ts.close();
  }
}

我尝试不调用TokenStream.end()TokenStream.close(),以为也许应该只在最后调用它们,但是我遇到了同样的异常.

I've tried not calling TokenStream.end() or TokenStream.close() thinking maybe they should only be called at the very end, but I get the same exception.

有人可以提出建议吗?

推荐答案

Exception可能会出现问题,列出您多次在调用reset()的情况.在Tokenizer的实现中明确不允许这样做.由于java.io.Reader api不能保证所有子类都支持reset()操作,因此Tokenizer毕竟不能假定可以重置传入的Reader.

The Exception lists, as a possible issue, calling reset() multiple times, which you are doing. This is explicitly not allowed in the implementation of Tokenizer. Since the the java.io.Reader api does not guarantee support of the reset() operation by all subclasses, the Tokenizer can't assume that the Reader passed in can be reset, after all.

您可以简单地构造一个新的TokenStream,或者我相信您可以调用Tokenizer.setReader(Reader)(在这种情况下,您当然必须先close()).

You may simply construct a new TokenStream, or I believe you could call Tokenizer.setReader(Reader) (in which case you certainly must close() it first).

这篇关于重新读取Lucene TokenStream时遇到问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆