MongoDB 中的“$or"和“$in"查询如何排序? [英] How does sorting work with `$or` and `$in` queries in MongoDB?
问题描述
这是对这个问题的跟进 - 请参阅上下文.
这个问题涉及链接问题的几个特殊情况 - 即在使用 $in
或 $or
运算符时,MongoDB 中的排序如何工作,以及如何确保使用用于排序的索引与内存中的排序.
$in:
例如,假设我们有一个集合,其中文档结构是
{a: XXX, b: XXX}
... 我们在 a
和 b
上有一个复合索引,并希望运行查询
{a: {$in: [4, 6, 2, 1, 3, 10]}, b: {$gt: 1, $lt: 6}}
如果排序在 a
或 b
上,将如何进行?$in
是排序的相等运算符,但在我看来,即使如此,对带有索引的 b
进行排序也是不可能的.我认为,只有先对 $in
值数组进行排序,才能使用索引对 a
进行排序 - 但我不知道 MongoDB 是否这样做.>
$ 或:
由于 $or
查询,IIUC 被作为多个查询处理,并且大概可以使用它们各自的索引进行排序,排序结果是否以某种方式合并或执行 $or
强制对所有结果进行内存排序?如果是前者,这个过程的时间复杂度是多少?
注意:此答案基于 MongoDB 3.2.4.
值得发现 explain()
在 MongoDB 中.查询的 explain()
输出(例如 db.collection.explain().find(...)
) 允许您检查查询中使用了哪个索引,并使用 db.collection.explain('executionStats')
也会告诉你由于内存中的SORT
限制,查询是成功还是失败.
$in
一个 $in
查询可以被认为是一系列等式查询.例如,{a: {$in: [1,3,5]}}
可以被认为是 {a:1}, {a:3}, {a:5}
.MongoDB 会在继续查询之前对 $in
数组进行排序,因此 {$in: [3,5,1]}
与 {$ 没有区别在:[1,3,5]}
.
假设集合的索引为
{a:1, b:1}
按
排序a
db.coll.find({a: {$in: [1,3,5]}}).sort({a:1})
MongoDB 将能够使用
{a:1,b:1}
索引,因为这个查询可以被认为是{a:1}, {a:3}, {a:5}
查询.按{a:1}
排序允许使用 索引前缀,因此MongoDB不需要执行内存排序.同样的情况也适用于查询:
db.coll.find({a: {$in: [1,3,5]} ,b:{$gte:1, $lt:2}}).sort({a:1})
由于
sort({a:1})
还使用索引前缀(在本例中为a
),因此是内存中的SORT
因此不需要阶段.按
排序b
与按
a
排序相比,这是一个更有趣的案例.例如:db.coll.find({a: {$in: [1,3,5]}}).sort({b:1})
此查询的
explain()
输出将有一个名为SORT_MERGE
的阶段.请记住,查询的find()
部分可以被认为是{a:1}, {a:3}, {a:5}
.查询
db.coll.find({a:1}).sort({b:1})
不需要在内存中SORT
由于{a:1,b:1}
索引的性质,阶段:也就是说,MongoDB 可以简单地遍历(排序的)索引并返回按b
排序的文档在满足a
上的相等参数之后.例如,对于每个a
,有许多b
由于索引而已经按b
排序.使用
$in
,整个查询可以认为是:db.coll.find({a:1}).sort({b:1})
db.coll.find({a:3}).sort({b:1})
db.coll.find({a:5}).sort({b:1})
- 获取上面的单个查询结果,并使用
b
的值执行合并.查询不需要内存中的排序阶段,因为单个查询结果已经按b
排序.MongoDB 只需要将(已经排序的)子查询结果合并为一个结果即可.
同理,查询
db.coll.find({a: {$in: [1,3,5]} ,b:{$gte:1, $lt:2}}).sort({b:1})
还使用了一个
SORT_MERGE
阶段并且与上面的查询非常相似.不同的是,各个查询输出的文档基于a range ofb
(而不是everyb
)为每个a
(由于索引{a:1,b:1}
,将按b
排序).因此,查询不需要内存中的排序阶段.
$ 或
对于使用索引的 $or
查询,$or
表达式中的每个子句都必须有一个与之关联的索引.如果满足此要求,则查询可以使用 SORT_MERGE
阶段,就像 $in
查询一样.例如:
db.coll.explain().find({$or:[{a:1},{a:3},{a:5}]}).sort({b:1})
将具有与上面的 $in
示例几乎相同的查询计划、索引使用和 SORT_MERGE
阶段.本质上,查询可以被认为是:
db.coll.find({a:1}).sort({b:1})
db.coll.find({a:3}).sort({b:1})
db.coll.find({a:5}).sort({b:1})
- 获取上面的单个查询结果,并使用
b
的值执行合并.
就像之前的 $in
示例一样.
但是,这个查询:
db.coll.explain().find({$or:[{a:1},{b:1}]}).sort({b:1})
不能使用任何索引(因为我们没有 {b:1}
索引).此查询将导致集合扫描,因此将有一个内存中排序阶段,因为没有使用索引.
然而,如果我们创建索引 {b:1}
,查询将如下进行:
db.coll.find({a:1}).sort({b:1})
db.coll.find({b:1}).sort({b:1})
- 取上面的单个查询结果,并使用
b
的值执行合并(由于索引{a:1,b:1}
和{b:1}
).
和 MongoDB 将组合 {a:1}
和 {b:1}
查询的结果并对结果执行合并.合并过程是线性时间,例如O(n)
.
总而言之,在 $or
查询中,每个词都必须有一个索引,包括 sort()
阶段.否则,MongoDB 将不得不执行内存中排序.
This is a follow-up to this question - see that for context.
This question concerns a couple of special cases of the linked question - namely how sorting in MongoDB works when using $in
or $or
operators, and how to ensure use of an index for sorting vs. an in-memory sort.
$in:
For example, assume we have a collection where the document structure is
{a: XXX, b: XXX}
... and we have a compound index on a
and b
in that order and want to run the query
{a: {$in: [4, 6, 2, 1, 3, 10]}, b: {$gt: 1, $lt: 6}}
How would the sort proceed if it was on a
or b
? $in
is an equality operator of sorts, but it seems to me that a sort on b
with an index is impossible even so. A sort on a
using an index is only possible if the $in
value array is sorted first, I think - but I don't know if MongoDB does this.
$or:
Since $or
queries, IIUC, are processed as multiple queries and can presumably use their respective indexes for sorts, do the sorted results get merged somehow or does $or
force an in-memory sort of all the results? If the former, what is the time complexity of this process?
Note: This answer is based on MongoDB 3.2.4.
It is worthwhile to discover the use of explain()
in MongoDB. The explain()
output of a query (e.g. db.collection.explain().find(...)
) allows you to check which index is used in a query, and using db.collection.explain('executionStats')
will also show you whether the query succeeds or fails due to in-memory SORT
limitation.
$in
A $in
query can be thought of as a series of equality queries. For example, {a: {$in: [1,3,5]}}
could be thought of as {a:1}, {a:3}, {a:5}
. MongoDB will sort the $in
array before proceeding with the query, so that {$in: [3,5,1]}
is no different to {$in: [1,3,5]}
.
Let's assume the collection has an index of
{a:1, b:1}
Sorting by
a
db.coll.find({a: {$in: [1,3,5]}}).sort({a:1})
MongoDB will be able to use the
{a:1,b:1}
index, since this query can be thought of as a union of{a:1}, {a:3}, {a:5}
queries. Sorting by{a:1}
allows the use of index prefix, so MongoDB does not need to perform an in-memory sort.The same situation also applies to the query:
db.coll.find({a: {$in: [1,3,5]} ,b:{$gte:1, $lt:2}}).sort({a:1})
since
sort({a:1})
also uses the index prefix (a
in this case), an in-memorySORT
stage is therefore not required.Sorting by
b
This is a more interesting case compared to sorting by
a
. For example:db.coll.find({a: {$in: [1,3,5]}}).sort({b:1})
The
explain()
output of this query will have a stage calledSORT_MERGE
. Remember that thefind()
portion of the query can be thought of as{a:1}, {a:3}, {a:5}
.The query
db.coll.find({a:1}).sort({b:1})
does not need to have an in-memorySORT
stage due to the nature of the{a:1,b:1}
index: that is, MongoDB can simply walk the (sorted) index and return documents sorted byb
after satisfying the equality parameter ona
. E.g., for eacha
, there are manyb
which are already sorted byb
due to the index.Using
$in
, the overall query can be thought of as:db.coll.find({a:1}).sort({b:1})
db.coll.find({a:3}).sort({b:1})
db.coll.find({a:5}).sort({b:1})
- Take the individual query results above, and perform a merge using the value of
b
. The query does not need an in-memory sort stage because the individual query results are already sorted byb
. MongoDB just need to merge the (already sorted) sub-query results into a single result.
Similarly, the query
db.coll.find({a: {$in: [1,3,5]} ,b:{$gte:1, $lt:2}}).sort({b:1})
also uses a
SORT_MERGE
stage and is very similar to the query above. The difference is that the individual queries output documents based on a range ofb
(instead of everyb
) for eacha
(which will be sorted byb
due to the index{a:1,b:1}
). Hence, the query does not need an in-memory sort stage.
$or
For an $or
query to use an index, every clause in the $or
expression must have an index associated with it. If this requirement is satisfied, it is possible for the query to employ a SORT_MERGE
stage just like an $in
query. For example:
db.coll.explain().find({$or:[{a:1},{a:3},{a:5}]}).sort({b:1})
will have an almost identical query plan, index use, and SORT_MERGE
stage as in the $in
example above. Essentially, the query can be thought as:
db.coll.find({a:1}).sort({b:1})
db.coll.find({a:3}).sort({b:1})
db.coll.find({a:5}).sort({b:1})
- Take the individual query results above, and perform a merge using the value of
b
.
just like the $in
example before.
However, this query:
db.coll.explain().find({$or:[{a:1},{b:1}]}).sort({b:1})
cannot use any index (since we do not have the {b:1}
index). This query will result in a collection scan, and consequently will have an in-memory sort stage since no index is used.
If, however, we create the index {b:1}
, the query will proceed like:
db.coll.find({a:1}).sort({b:1})
db.coll.find({b:1}).sort({b:1})
- Take the individual query results above, and perform a merge using the value of
b
(which is already sorted at both sub-queries, due to the indexes{a:1,b:1}
and{b:1}
).
and MongoDB will combine the results of {a:1}
and {b:1}
queries and perform a merge on the results. The merging process is linear time, e.g. O(n)
.
In conclusion, in a $or
query, every term must have an index, including the sort()
stage. Otherwise, MongoDB will will have to perform an in-memory sort.
这篇关于MongoDB 中的“$or"和“$in"查询如何排序?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!