从Lucene8索引中提取所有字段 [英] extracting all fields from a Lucene8 index

查看：41 发布时间：2021/5/30 21:45:38 java lucene

本文介绍了从Lucene8索引中提取所有字段的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

给出一个使用Lucene-8创建的索引，但是不了解所使用的 field 的索引，我如何以编程方式提取所有字段?(我知道Luke浏览器可以交互使用(感谢@andrewjames)使用最新版本的Lucene的示例.)这种情况是，在开发阶段，我必须读取没有规定模式的索引.我正在使用

Given an index created with Lucene-8, but without knowledge of the fields used, how can I programmatically extract all the fields? (I'm aware that the Luke browser can be used interactively (thanks to @andrewjames) Examples for using latest version of Lucene. ) The scenario is that, during a development phase, I have to read indexes without prescribed schemas. I'm using

IndexReader reader = DirectoryReader.open(FSDirectory.open(Paths.get(index)));
IndexSearcher searcher = new IndexSearcher(reader);

reader 具有以下方法:

reader.getDocCount(field);

但这需要事先知道这些字段.

but this requires knowing the fields in advance.

我了解索引中的文档可能会使用不同的字段进行索引；我已经准备好遍历所有文档并定期提取字段(这些索引不是很大).

I understand that documents in the index may be indexed with different fields; I'm quite prepared to iterate over all documents and extract the fields on a regular basis (these indexes are not huge).

我使用的是Lucene 8.5.*，因此基于早期Lucene版本的帖子和教程可能无法正常工作.

I'm using Lucene 8.5.* so post and tutorials based on earlier Lucene versions may not work.

推荐答案

您可以按以下方式访问基本字段信息:

You can access basic field info as follows:

import java.util.List;
import java.io.IOException;
import java.nio.file.Paths;
import org.apache.lucene.document.Document;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexableField;
import org.apache.lucene.store.FSDirectory;

public class IndexDataExplorer {

    private static final String INDEX_PATH = "/path/to/index/directory";

    public static void doSearch() throws IOException {
        IndexReader reader = DirectoryReader.open(FSDirectory.open(Paths.get(INDEX_PATH)));
        for (int i = 0; i < reader.numDocs(); i++) {
            Document doc = reader.document(i);
            List<IndexableField> fields = doc.getFields();
            for (IndexableField field : fields) {
                // use these to get field-related data:
                //field.name();
                //field.fieldType().toString();
            }
        }
    }
}

这篇关于从Lucene8索引中提取所有字段的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

从Lucene8索引中提取所有字段 [英] extracting all fields from a Lucene8 index

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

从Lucene8索引中提取所有字段 [英] extracting all fields from a Lucene8 index

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭