重新读取Lucene TokenStream时遇到问题 [英] Having trouble rereading a Lucene TokenStream
问题描述
我正在使用Lucene 4.6,并且由于如何处理TokenStream,显然不清楚如何重用TokenStream:
I am using Lucene 4.6, and am apparently unclear on how to reuse a TokenStream, because I get the exception:
java.lang.IllegalStateException: TokenStream contract violation: reset()/close() call missing, reset() called multiple times, or subclass does not call super.reset(). Please see Javadocs of TokenStream class for more information about the correct consuming workflow.
在第二遍开始时
.我已经阅读了Javadoc,但是仍然缺少一些东西.这是引发上述异常的简单示例:
at the start of the second pass. I've read the Javadoc, but I'm still missing something. Here is a simple example that throws the above exception:
@Test
public void list() throws Exception {
String text = "here are some words";
TokenStream ts = new StandardTokenizer(Version.LUCENE_46, new StringReader(text));
listTokens(ts);
listTokens(ts);
}
public static void listTokens(TokenStream ts) throws Exception {
CharTermAttribute termAtt = ts.addAttribute(CharTermAttribute.class);
try {
ts.reset();
while (ts.incrementToken()) {
System.out.println("token text: " + termAtt.toString());
}
ts.end();
}
finally {
ts.close();
}
}
我尝试不调用TokenStream.end()
或TokenStream.close()
,以为也许应该只在最后调用它们,但是我遇到了同样的异常.
I've tried not calling TokenStream.end()
or TokenStream.close()
thinking maybe they should only be called at the very end, but I get the same exception.
有人可以提出建议吗?
推荐答案
Exception
可能会出现问题,列出您多次在调用reset()
的情况.在Tokenizer
的实现中明确不允许这样做.由于java.io.Reader
api不能保证所有子类都支持reset()
操作,因此Tokenizer
毕竟不能假定可以重置传入的Reader
.
The Exception
lists, as a possible issue, calling reset()
multiple times, which you are doing. This is explicitly not allowed in the implementation of Tokenizer
. Since the the java.io.Reader
api does not guarantee support of the reset()
operation by all subclasses, the Tokenizer
can't assume that the Reader
passed in can be reset, after all.
您可以简单地构造一个新的TokenStream,或者我相信您可以调用Tokenizer.setReader(Reader)
(在这种情况下,您当然必须先close()
).
You may simply construct a new TokenStream, or I believe you could call Tokenizer.setReader(Reader)
(in which case you certainly must close()
it first).
这篇关于重新读取Lucene TokenStream时遇到问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!