如何在Lucene 5中获取跨度项查询的匹配范围? [英] How to get the matching spans of a Span Term Query in Lucene 5?

查看:134
本文介绍了如何在Lucene 5中获取跨度项查询的匹配范围?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在Lucene中,为了绕开术语,建议使用跨度查询". http://lucidworks.com中有很好的演练/blog/access-words-around-in-positionalmatch-in-lucene/

In Lucene to get the words around a term it is advised to use Span Queries. There is good walkthrough in http://lucidworks.com/blog/accessing-words-around-a-positional-match-in-lucene/

应该使用getSpans()方法访问范围.

The spans are supposed to be accessed using the getSpans() method.

SpanTermQuery fleeceQ = new SpanTermQuery(new Term("content", "fleece"));
Spans spans = fleeceQ.getSpans(searcher.getIndexReader());

然后在Lucene 4中,API发生了变化,getSpans()方法变得更加复杂,最后,在最新的Lucene版本(5.3.0)中,该方法被删除(显然已移至SpanWeight类).

Then in Lucene 4 the API changed and the getSpans() method got more complex, and finally, in the latest Lucene release (5.3.0), this method was removed (apparently moved to the SpanWeight class).

那么,访问跨度词查询匹配的跨度的当前方法是什么?

So, which is the current way of accessing spans matched by a span term query?

推荐答案

方法如下.

LeafReader pseudoAtomicReader = SlowCompositeReaderWrapper.wrap(reader);
Term term = new Term("field", "fox");
SpanTermQuery spanTermQuery = new SpanTermQuery(term);
SpanWeight spanWeight = spanTermQuery.createWeight(is, false);
Spans spans = spanWeight.getSpans(pseudoAtomicReader.getContext(), Postings.POSITIONS);

Lucene 5.3版也不再支持通过span.next()遍历跨度.要遍历跨度,您可以

The support for iterating over the spans via span.next() is also gone in version 5.3 of Lucene. To iterate over the spans you can do

int nxtDoc = 0;
while((nxtDoc = spans.nextDoc()) != spans.NO_MORE_DOCS){
  System.out.println(spans.toString());
  int id = nxtDoc;
  System.out.println("doc_id="+id);
  Document doc = reader.document(id);
  System.out.println(doc.getField("field"));
  System.out.println(spans.nextStartPosition());
  System.out.println(spans.endPosition());
}

这篇关于如何在Lucene 5中获取跨度项查询的匹配范围?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆