如何使用索引进行排序在MongoDB中有效? [英] How does sorting with an index work in MongoDB?

查看:103
本文介绍了如何使用索引进行排序在MongoDB中有效?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道如何使用索引进行排序实际上在MongoDB中有效。有一个情侣 文章,但它们实际上并没有描述排序进行或时间复杂。到目前为止,搜索SO和一般的互联网并没有发现任何相关的内容。

I'm wondering how sorting with an index actually works in MongoDB. There are a couple articles in the MongoDB documentation, but they don't actually describe how the sort proceeds or the time complexity. Searches of SO and the interweb in general so far haven't turned up anything relevant.

我们假设集合中有 a 个文档, find()子句匹配 b 文档,返回 c 文件的限制, a >> b > > c c 是一些适当大的数字,因此返回的集合无法容纳在内存中 - 例如,假设1M文档。

Let's assume there are a documents in a collection, the find() clause matches b documents, there's a limit of c documents returned, a >> b >> c, and c is some suitably large number such that the returned set cannot fit in memory - let's say 1M documents, for example.

在操作开始时,存在需要进行排序的 b 文档以及用于文档特征的大小 a 的排序树索引将按排序。

At the start of the operation, there exist b documents that needs to be sorted and a sorted tree index of size a for the feature the documents will be sorted by.

我可以想象:

A)按顺序遍历索引,并为每个ObjectID遍历列表 b 文件。返回匹配项,直到达到 c 。这将是O( ab )。

A) traverse the index in order, and for each ObjectID traverse the list of b documents. Return matches until c is reached. This would be O(ab).

B)作为A),但是在 b <中构建ObjectID的哈希集/ em>首先是文件。这是O( a ),但需要O( b )内存。

B) as A), but build a hashset of the ObjectIDs in the b documents first. This is O(a), but takes O(b) memory.

我试过考虑基于遍历 b 文档集进行排序,但似乎没有比O( b log b )更快的结果,这并不比没有索引的排序好。

I've tried to consider sorts based on traversing the set of b documents, but can't seem to come up with anything faster than O(b log b), which is no better than sorting without an index.

我认为(但也许我错了)每种排序都不需要索引扫描,那么如何排序实际上有效吗?

I assume (but maybe I'm wrong) that every sort doesn't require an index scan, so how does the sort actually work?

更新:

Kevin的答案和提供的链接缩小范围问题很多,但我想确认/澄清几点:

Kevin's answer and provided link narrow down the question a lot, but I'd like to confirm/clarify a few points:


  1. 据我了解,你不能使用不同的索引对于查询和排序,如果要避免内存中的排序。当我阅读此页时,看起来好像你可以(或者至少,它没有指定一种方式或另一种方式),但这似乎是不正确的。本质上,文档是排序的,因为它们在查询期间按索引的顺序查找,因此按索引的顺序返回。对吗?

  2. 查询复合索引时,排序索引必须是复合索引中的第一个索引,但查询相等的索引除外。如果不是,则在内存中执行排序。对吗?

  3. 如何使用 $ $或查询?例如,假设查询是

  1. As I understand it, you cannot use different indexes for the query and the sort if you want to avoid an in-memory sort. When I read this page it appeared as though you could (or at least, it didn't specify one way or the other), but that seems to be incorrect. Essentially, the documents are sorted because they're looked up in the order of the index during the query and therefore returned in the order of the index. Right?
  2. When querying on a compound index, the the sorting index must be the first index in the compound index, except for indexes where the query is an equality. If not, the sort is performed in memory. Right?
  3. How does sorting work with $in or $or queries? For example, assume the query is

{a:{$ in:[4,6,2,1,3,10]}, b:{$ gt:1,$ lt:6}}

.. 。并且在该订单中 a b 上有一个复合索引。如果排序在 a b 的情况下排序如何工作? $或甚至更复杂,因为据我所知, $或查询基本上分为多个单独的查询。 $或查询总是内存排序,至少是为了合并单独查询的结果吗?

... and there's a compound index on a and b in that order. How would the sort work in the cases the sort is on a or b? $or is even more complicated since, as I understand it, $or queries are essentially split into multiple separate queries. Are $or queries always an in-memory sort, at least for merging the results of the separate queries?

推荐答案

MongoDB中的索引存储在B树结构中,其中每个索引条目指向磁盘上的特定位置。使用B树结构也意味着MongoDB索引以排序顺序存储,总是按顺序遍历,并且MongoDB通过索引以排序顺序获取一系列文档很便宜。

Indexes in MongoDB are stored in a B-tree structure, where each index entry points to a specific location on-disk. Using a B-tree structure also means that a MongoDB index is stored in a sorted order, always traversed in-order, and is cheap for MongoDB to fetch a series of documents in a sorted order via indexes.

查询中的 SORT 阶段(即内存中排序)限制为32MB内存使用。如果 SORT 阶段超出此限制,则查询将失败。可以通过利用索引的排序特性来回避此限制,以便MongoDB可以返回带有 sort()参数的查询,而无需执行内存中排序。

A SORT stage (i.e. in-memory sort) in a query is limited to 32MB of memory use. A query will fail if the SORT stage exceeds this limit. This limit can be sidestepped by utilizing the sorted nature of indexes, so that MongoDB can return a query with a sort() parameter without performing an in-memory sort.

我们假设查询的形状为:

Let us assume that the query is of the shape:

    db.a.find({b:{$gt:100}, c:{$gt:200}}).sort(...)

收集 a 索引为:

    db.a.createIndex({b:1,c:1})

那里在查询中指定 sort()阶段时有两种可能的情况:

There are two possible scenarios when a sort() stage is specified in the query:

1。 MongoDB无法使用索引的排序特性,必须执行内存 SORT 阶段

1. MongoDB cannot use the sorted nature of the index and must perform an in-memory SORT stage.

如果查询不能使用索引前缀,则结果如此。例如:

This is the outcome if the query cannot use the "index prefix". For example:

    db.a.find({b:{$gt:100}, c:{$gt:200}}).sort({c:1})

在上面的查询中,索引 {b:1,c:1} 可用于:

In the query above, the index {b:1,c:1} can be used to:


  • 匹配文件查询的 {b:{$ gt:100}} 部分 b 大于100。

  • 但是,无法保证返回的文档按照 c 进行排序。

  • Match documents having b greater than 100 for the {b:{$gt:100}} portion of the query.
  • However, there is no guarantee that the returned documents are sorted in terms of c.

因此,MongoDB别无选择,只能执行内存中排序。此查询的 explain()输出将具有 SORT 阶段。此 SORT 阶段将限制为32MB内存使用。

Therefore, MongoDB has no choice but to perform an in-memory sort. The explain() output of this query will have a SORT stage. This SORT stage would be limited to 32MB of memory use.

2。 MongoDB可以使用索引的排序性质

如果查询使用了以下结果:

This is the outcome if the query uses:


  • 对与索引顺序匹配的键进行排序,

  • 指定与索引相同的顺序(即索引 { b:1,c:1} 可用于 sort({b:1,c:1}) sort({b:-1,c:-1})但不是 sort({b:1,c:-1})

  • Sort keys that matches the order of the index, and
  • Specifies the same ordering as the index (i.e. the index {b:1,c:1} can be used for sort({b:1,c:1}) or sort({b:-1,c:-1}) but not sort({b:1,c:-1}))

例如:

    db.a.find({b:{$gt:100}, c:{$gt:200}}).sort({b:1})

在上面的查询中,索引 {b:1,c:1} 可用于:

In the query above, the index {b:1,c:1} can be used to:


  • {匹配 b 大于100的文件b:{$ gt:100}} 查询的一部分。

  • 在这种情况下, MongoDB可以保证返回的文档按术语排序 b

  • Match documents having b greater than 100 for the {b:{$gt:100}} portion of the query.
  • In this case, MongoDB can guarantee that the returned documents are sorted in terms of b.

explain()上面查询的输出将不是拥有 SORT 阶段。此外,带有和不带 sort()的查询的 explain()输出是相同的。本质上,我们免费获得 sort()

The explain() output of the query above will not have a SORT stage. Also, the explain() output of the query with and without sort() are identical. In essence, we are getting the sort() for free.

理解这个主题的一个有价值的资源是< a href =https://emptysqua.re/blog/optimizing-mongodb-compound-indexes/\"rel =noreferrer>优化MongoDB复合索引。请注意,这篇博客文章是在2012年写的。尽管某些术语可能已经过时,但该帖子的技术性仍然相关。

A worthwhile resource to understand this subject is Optimizing MongoDB Compound Indexes. Please note that this blog post was written way back in 2012. Although some of the terminology may be outdated, the technicality of the post is still relevant.

更新关于后续问题


  1. MongoDB使用大多数查询只有一个索引。例如,要避免查询中的内存 SORT 阶段

  1. MongoDB uses only one index for most queries. So for example, to avoid an in-memory SORT stage in the query

db.a.find({a:1}).sort({b:1})

索引必须同时包含 a b 字段;例如需要复合索引,例如 {a:1,b:1} 。你不能有两个单独的索引 {a:1} {b:1} ,并期望 {a:1} 索引用于等于部分, {b:1} 索引用于排序部分。在这种情况下,MongoDB将选择两个索引中的一个。

the index must cover both a and b fields at the same time; e.g. a compound index such as {a:1,b:1} is required. You cannot have two separate indexes {a:1} and {b:1}, and expect the {a:1} index to be used for the equality part, and the {b:1} index to be used for the sort part. In this case, MongoDB will choose one of the two indexes.

因此,对结果进行排序是正确的,因为它们被查找并按照顺序返回指数。

Therefore, it is correct that the results are sorted because they are looked up and returned in the order of the index.

为了避免使用复合索引进行内存中排序,索引的第一部分必须满足相等部分查询,第二部分必须满足查询的排序部分(如上面(1)的解释所示)。

To avoid having an in-memory sort using a compound index, the first part of the index must cater to the equality part of the query, and the second part must cater to the sort part of the query (as shown in the explanation for (1) above).

如果你有这样的查询:

db.a.find({}).sort({a:1})

索引 {a:1,b:1} 可用于排序部分(因为您基本上返回整个集合)。如果您的查询如下所示:

the index {a:1,b:1} can be used for the sort part (since you're basically returning the whole collection). And if your query looks like this:

db.a.find({a:1}).sort({b:1})

相同的索引 {a:1,b:1} 也可以用于查询的两个部分。另外:

the same index {a:1,b:1} can also be used for both parts of the query. Also:

db.a.find({a:1,b:1})

也可以使用相同的索引 {a:1,b:1}

can also use the same index {a:1,b:1}

注意这里的模式: find()后跟 sort()参数遵循索引顺序 {a:1,b:1} 。因此,复合索引必须按相等 - >排序排序。

Notice the pattern here: the find() followed by sort() parameters follow the index order {a:1,b:1}. Therefore a compound index must be ordered by equality -> sort.

这篇关于如何使用索引进行排序在MongoDB中有效?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆