Solr * vs *:* 查询性能 [英] Solr * vs *:* query performance
问题描述
我们正在运行 Solr 3.4 并且有一个相对较小的索引,大约有 90,000 个文档.这些文档分为几个逻辑源,因此每次搜索都将应用针对特定源的过滤器查询,例如:
We're running Solr 3.4 and have a relatively small index of 90,000 documents or so. These documents are split over several logical sources, and so each search will have an applied filter query for a particular source, e.g:
?q=<query>&fq=source:<source>
其中 source
是一个经典的字符串字段.我们正在使用 edismax 并有一个默认搜索字段 text.
where source
is a classic string field. We're using edismax and have a default search field text.
我们目前看到 q=*
的平均运行时间是 q=*:*
的 20 倍.差异非常明显,*:*
需要 100 毫秒,而 *
最多需要 3500 毫秒.在文档集中搜索一个常用词(匹配所有文档的近 50%)将在不到 200 毫秒内返回结果.
We are currently seeing q=*
taking on average 20 times longer to run than q=*:*
. The difference is quite noticeable, with *:*
taking 100ms and *
taking up to 3500ms. A search for a common word in the document set (matching nearly 50% of all documents) will return a result in less than 200ms.
查看启用了 debugQuery 的查询,我们可以看到 *
被解析为 DisjunctionMaxQuery((text:*))
,而 *:*
被解析为 MatchAllDocsQuery(*:*)
.这是有道理的,但我仍然不认为它会导致这种幅度的放缓(与 50% 的文档匹配的内容相比,下降了 2000%).
Looking at the queries with debugQuery on, we can see that *
is parsed to a DisjunctionMaxQuery((text:*))
, while *:*
is parsed to a MatchAllDocsQuery(*:*)
. This makes sense, but I still don't feel like it accounts for a slowdown of this magnitude (a slowdown of 2000% over something that matches 50% of the documents).
可能是什么原因造成的?有什么我们可以调整的吗?
What could be causing this? Is there anything we can tweak?
推荐答案
当您只传递 *
时,您正在命令检查字段中的每个值并将其与 *匹配代码>,还有很多工作要做.但是,当您使用
* : *
时,您要求 Solr 为您提供所有内容并跳过任何匹配.
When you are passing just *
you are ordering to check every value in the field and match it against *
and that is a lot to do. However when you are using * : *
you are asking Solr to give you everything and skip any matching.
Solr/Lucene 经过优化,可以快速高效地执行 * : *
!
Solr/Lucene is optimized to do * : *
fast and efficient!
这篇关于Solr * vs *:* 查询性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!