如何在Lucene 7+中通过文档ID获取DocValue? [英] How to get DocValue by document ID in Lucene 7+?

查看：550 发布时间：2020/5/4 7:53:11 solr lucene

本文介绍了如何在Lucene 7+中通过文档ID获取DocValue?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我要通过以下方式将DocValue添加到文档中

I'm adding a DocValue to a document with

doc.add(new BinaryDocValuesField("foo",new BytesRef("bar")));

要为ID为docId的特定文档检索该值，请致电

To retrieve that value for a specific document with ID docId, I call

DocValues.getBinary(reader,"foo").get(docId).utf8ToString();

BinaryDocValues中的get函数最多受

The get function in BinaryDocValues is supported up to Lucene 6.6, but for Lucene 7.0 and up it does not seem to be available anymore.

因此，如何在Lucene 7+中按文档ID获取DocValue(无需迭代BinaryDocValues/DocIdSetIterator，而不必重新获取BinaryDocValues和每次都使用advanceExact)?

So, how do I get the DocValue by document ID in Lucene 7+ (without having to iterate over BinaryDocValues / DocIdSetIterator, and without having to re-get BinaryDocValues and use advanceExact every time) ?

推荐答案

理论
Doc值是Lucene的列跨步字段值存储.出于面值和排序的目的，Doc值在查询时用于随机访问的速度非常快. 以下问题 LUCENE-7407 将访问模式从随机访问切换为迭代器.因为与任意随机访问API相比，迭代器API的访问模式要严格得多，所以此更改为Lucene使用主动压缩和其他优化提供了更大的自由度和功能:

Theory

Doc values are Lucene's column-stride field value storage. Doc values were intended to be quite fast for random access at query time for faceting and sorting purposes. The following issue LUCENE-7407 switches access pattern from random-access to an iterator. Because an iterator API is a much more restrictive access pattern than an arbitrary random access API, this change gives Lucene more freedom and power to use aggressive compression and other optimizations:

在数据稀疏的情况下减少磁盘空间的使用

即使在非稀疏情况下，压缩率和文档值解码速度也更快

删除缺失值的特殊列(getDocsWithField)并线程本地编解码器阅读器

您可以在以下博客中了解有关此更改的信息:

You can read about this change in the following blogs:

将文档值用作迭代器

使用Apache Lucene的稀疏文档与密集文档的值

Doc values as iterators

Sparse versus dense document values with Apache Lucene

在实践中，此更改在某些情况下会导致性能下降，例如 SOLR-9599 .在主要情况下(构面和排序)，可以正确使用迭代API，甚至可以执行一些优化. 实际上，在很多情况下，此API并不是一个很好的解决方案.所有这些情况都被当作不正确的用法丢弃(与sun.misc.Unsafe在java单词中遇到的相同问题).

In practice this change causes performance degradation in some cases, for example SOLR-9599. In major case(faceting and sorting) an iterative API is OK with proper usage and, even more, allows to perform some optimizations. In fact there are a lot of cases where this API is not a good solution. All these cases were discarded as an incorrect usage(the same problem we had in java word with sun.misc.Unsafe).

实际上，org.apache.lucene.index.DocValuesIterator#advanceExact相当快，并且在某些实现中具有相似的性能和复杂性.

In fact, org.apache.lucene.index.DocValuesIterator#advanceExact is quite fast and has similar performance and complexity in case of some implementations.

这篇关于如何在Lucene 7+中通过文档ID获取DocValue?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何在Lucene 7+中通过文档ID获取DocValue? [英] How to get DocValue by document ID in Lucene 7+?

问题描述

推荐答案

理论

Theory

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何在Lucene 7+中通过文档ID获取DocValue? [英] How to get DocValue by document ID in Lucene 7+?

问题描述

推荐答案

理论

Theory

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭