{过滤}比{查询} Lucene更快吗? [英] Is {Filter}ing faster than {Query}ing in Lucene?

查看:104
本文介绍了{过滤}比{查询} Lucene更快吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在阅读Lucene in Action 2nd Edition时,我遇到了 Filter 类的描述,这些类可用于Lucene中的结果过滤。 Lucene有很多过滤器重复 Query 类。例如, NumericRangeQuery NumericRangeFilter

While reading "Lucene in Action 2nd edition" I came across the description of Filter classes which are could be used for result filtering in Lucene. Lucene has a lot of filters repeating Query classes. For example, NumericRangeQuery and NumericRangeFilter.

本书说 NRF NRQ 完全相同,但没有文件评分。这是否意味着如果我不需要评分或按文档字段值对文档进行排序,我应该更喜欢过滤结束从绩效角度查询

The book says that NRF does exactly the same as NRQ but without document scoring. Does this means that if I do not need scoring or sort documents by document field value I should prefer Filtering over Querying from performance point of view?

推荐答案

我收到了很棒的信息来自Uwe Schindler的答案,让我在这里重新发布。

I receive a great answer from Uwe Schindler, let me repost it here.


如果你不缓存过滤器,查询会更快,因为ConjunctionScorer $ b $ Lucene中的b具有优化,目前不用于过滤器。
过滤器很好,如果你缓存它们(例如,对于应用于所有查询的特定用户,你总是拥有相同的访问权限
)。在
的情况下,Filter只执行一次并缓存所有进一步的
请求,然后与查询结果集相交。

If you dont cache filters, queries will be faster, as the ConjunctionScorer in Lucene has optimizations, which are currently not used for Filters. Filters are fine, if you cache them (e.g. if you always have the same access restrictions for a specific user that are applied to all his queries). In that case the Filter is only executed once and cached for all further requests and then intersected with the query result set.

如果你只是想要例如随机过滤,例如通过可变数值范围
,就像地理搜索中的边界框一样,使用查询,查询在大多数
的情况下更快(例如,范围查询和类似的东西 - 称为MultiTermQueries
- 在内部也被实现通过与
过滤器相同的BitSet算法 - 实际上它们只是由Scorer-impl包装的过滤器。但是,将
(ConjunctionScorer)与查询和过滤器查询进行对比的
得分者通常比搜索后应用
过滤器的代码更快。这可能会有一些改进,但一般来说,
过滤器在Lucene中是不再需要的,所以有
已经有一些方法可以使过滤器和查询相同,而
则是能够缓存非评分查询。这将使得b $ b b代码变得更容易。

If you only want to e.g. randomly "filter" e.g. by a variable numeric range like a bounding box in a geographic search, use queries, queries are in most cases faster (e.g. Range Queries and similar stuff - called MultiTermQueries - are internally also implemented by the same BitSet algorithm like the Filter - in fact they are only Filters wrapped by a Scorer-impl). But the Scorer that ANDs the query and your "filter" query together (ConjunctionScorer) is generally faster than the code that applies the filter after searching. This may some improvement possible, but in general filters are something in Lucene that is not really needed anymore, so there were already some approaches to make Filters and Queries the same, and instead then be able to also cache non-scoring queries. This would make lots of code easier.

过滤器可以带来Lucene 4.0的巨大速度提升,如果它们是
插在IndexReader上在评分之前过滤文件
但尚未实施(参见
https://issues.apache.org/jira/browse/LUCENE-3212 ) - 我正在研究它。我们
也可以使过滤器随机访问(它很容易,因为它们是位集),
也可以改善查询后过滤。但是,如果他们可以支持它,那么我还会使
查询部分随机访问(例如
仅基于FieldCache的查询)。

Filters can bring a huge speed improvement with Lucene 4.0, if they are plugged ontop of the IndexReader to filter the documents before scoring, but that's not yet implemented (see https://issues.apache.org/jira/browse/LUCENE-3212) - I am working on it. We may also make Filters random access (it's easy as they are bitsets), which could improve also the after-query filtering. But I would then also make Queries partially random access, if they could support it (like queries that are only based on FieldCache).

Uwe

这篇关于{过滤}比{查询} Lucene更快吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆