如何在 Lucene 4 中获取 Lucene 字段的所有术语 [英] How to get all terms for a Lucene field in Lucene 4

查看:48
本文介绍了如何在 Lucene 4 中获取 Lucene 字段的所有术语的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将我的代码从 Lucene 3.4 更新到 4.1.我想出了除了一个之外的变化.我有需要迭代一个字段的所有术语值的代码.在 Lucene 3.1 中,有一个 IndexReader#terms() 方法提供了一个 TermEnum,我可以对其进行迭代.对于 Lucene 4.1,这似乎已经发生了变化,即使在文档中搜索了几个小时后,我也无法弄清楚如何进行.有人可以指出我正确的方向吗?

I'm trying to update my code from Lucene 3.4 to 4.1. I figured out the changes except one. I have code which needs to iterate over all term values for one field. In Lucene 3.1 there was an IndexReader#terms() method providing a TermEnum, which I could iterate over. This seems to have changed for Lucene 4.1 and even after several hours of search in the documentation I am not able to figure out how. Can someone please point me in the right direction?

谢谢.

推荐答案

请关注 Lucene 4迁移指南::

您获取枚举的方式已更改.主要入口点是Fields 类.如果您知道您的读者是单段读者,请执行以下操作这个:

How you obtain the enums has changed. The primary entry point is the Fields class. If you know your reader is a single segment reader, do this:

Fields fields = reader.Fields();
if (fields != null) {
  ...
}

如果读者可能是多段的,你必须这样做:

If the reader might be multi-segment, you must do this:

Fields fields = MultiFields.getFields(reader);
if (fields != null) {
  ...
}

fields 可以是 null(例如,如果阅读器没有字段).

The fields may be null (eg if the reader has no fields).

请注意,MultiFields 方法会降低性能MultiReaders,因为它必须动态合并术语/文档/位置.它是通常最好改为获得顺序阅读器(使用oal.util.ReaderUtil),然后自己逐步浏览这些阅读器,如果可以的话(这就是 Lucene 驱动搜索的方式).

Note that the MultiFields approach entails a performance hit on MultiReaders, as it must merge terms/docs/positions on the fly. It's generally better to instead get the sequential readers (use oal.util.ReaderUtil) and then step through those readers yourself, if you can (this is how Lucene drives searches).

如果您将 SegmentReader 传递给 MultiFields.fields 它只会返回 reader.fields(),因此在这种情况下不会影响性能.

If you pass a SegmentReader to MultiFields.fields it will simply return reader.fields(), so there is no performance hit in that case.

一旦你有一个非空的字段,你可以这样做:

Once you have a non-null Fields you can do this:

Terms terms = fields.terms("field");
if (terms != null) {
  ...
}

terms 可以是 null(例如,如果该字段不存在).

The terms may be null (eg if the field does not exist).

一旦你有一个非null 术语,你就可以得到一个像这样的枚举:

Once you have a non-null terms you can get an enum like this:

TermsEnum termsEnum = terms.iterator();

返回的TermsEnum不会为空.

然后你可以通过TermsEnum

这篇关于如何在 Lucene 4 中获取 Lucene 字段的所有术语的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆