MongoDB复合索引-排序顺序重要吗? [英] MongoDB Compound Indexes - Does the sort order matter?

查看:1238
本文介绍了MongoDB复合索引-排序顺序重要吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我最近潜入mongodb从事我的一个项目. 我一直在阅读索引,对于一个很小的集合,我知道这没什么大不了的,但是随着索引的增长,如果没有正确的索引和查询,将会出现性能问题.

I've dived recently into mongodb for a project of mine. I've been reading up on indexes, and for a small collection, i know it wouldn't matter much but when it grows there's going to be performance issues without the right indexes and queries.

让我说我有这样的收藏

{user_id:1,slug:'one-slug'}
{user_id:1,slug:'another-slug'}
{user_id:2,slug:'one-slug'}
{user_id:3,slug:'just-a-slug}

我必须在

user id == 1 and slug == 'one-slug'

在此集合中,子段对于用户ID是唯一的. 也就是说,用户ID 1只能具有值"one-slug"的一个子段.

In this collection, slugs will be unique to user ids. That is, user id 1 can have only one slug of the value 'one-slug'.

我知道,由于user_id的基数高,因此应该给予它优先级,但是会怎么样?由于它在大多数情况下都是独一无二的.我也无法把头放在升序和降序索引上,或者在这种情况下,或者它将在此集合中使用的正确顺序如何影响性能.

I understand that user_id should be given priority due to its high cardinality, but what about slug? Since its unique as well most of the time. I also cant wrap my head around ascending and descending indexes, or how its going to affect performance in this case or the right order i should be using in this collection.

我已经读了一点,但是我无法绕开它,特别是对于我的情况.听到别人的声音会很棒.

I've read a bit but i can't wrap my head around it, particularly for my scenario. Would be awesome to hear from others.

推荐答案

您可以将MongoDB单字段索引视为一个数组,其中包含指向文档位置的指针.例如,如果您有一个带有的集合(请注意,该序列是故意乱序的):

You can think of MongoDB single-field index as an array, with pointers to document locations. For example, if you have a collection with (note that the sequence is deliberately out-of-order):

[collection]
1: {a:3, b:2}
2: {a:1, b:2}
3: {a:2, b:1}
4: {a:1, b:1}
5: {a:2, b:2}

单字段索引

现在,如果您这样做:

Single-field index

Now if you do:

db.collection.createIndex({a:1})

索引大致如下:

[index a:1]
1: {a:1} --> 2, 4
2: {a:2} --> 3, 5
3: {a:3} --> 1

请注意三件重要的事情:

Note three important things:

  • a升序排序
  • 每个入口均指向相关文档所在的位置
  • 索引仅记录a字段的值. b字段根本不存在于索引中
  • It's sorted by a ascending
  • Each entry points to the location where the relevant documents resides
  • The index only records the values of the a field. The b field does not exist in the index at all

因此,如果您执行以下查询:

So if you do a query like:

db.collection.find().sort({a:1})

所有要做的就是从上到下遍历索引,获取并输出条目所指向的文档.请注意,您也可以从底部开始浏览索引,例如:

All it has to do is to walk the index from top to bottom, fetching and outputting the document pointed to by the entries. Notice that you can also walk the index from the bottom, e.g.:

db.collection.find().sort({a:-1})

唯一的区别是您反向移动索引.

and the only difference is you walk the index in reverse.

因为b根本不在索引中,所以在查询有关b的任何内容时都无法使用索引.

Because b is not in the index at all, you cannot use the index when querying anything about b.

在复合索引中,例如:

db.collection.createIndex({a:1, b:1})

这意味着您要先按a排序,然后再按b排序.索引如下:

It means that you want to sort by a first, then sort by b. The index would look like:

[index a:1, b:1]
1: {a:1, b:1} --> 4
2: {a:1, b:2} --> 2
3: {a:2, b:1} --> 3
4: {a:2, b:2} --> 5
5: {a:3, b:2} --> 1

请注意:

  • 索引是按a
  • 排序的
  • 在每个a中,您都有一个已排序的b
  • 您有5个索引条目,而在前面的单字段示例中只有3个索引条目
  • The index is sorted from a
  • Within each a you have a sorted b
  • You have 5 index entries vs. only three in the previous single-field example

使用该索引,您可以执行以下查询:

Using this index, you can do a query like:

db.collection.find({a:2}).sort({b:1})

它可以轻松找到a:2的位置,然后向前索引. 给出该索引,您将无法执行:

It can easily find where a:2 then walk the index forward. Given that index, you cannot do:

db.collection.find().sort({b:1})
db.collection.find({b:1})

在两个查询中,由于b遍及整个索引(即不在连续的条目中),因此无法轻松找到.但是,您可以这样做:

In both queries you can't easily find b since it's spread all over the index (i.e. not in contiguous entries). However you can do:

db.collection.find({a:2}).sort({b:-1})

因为您基本上可以找到a:2的位置,然后将b条目向后移动.

since you can essentially find where the a:2 are, and walk the b entries backward.

编辑:在评论中澄清@marcospgp的问题:

Edit: clarification of @marcospgp's question in the comment:

如果从排序表的角度来看,使用索引{a:1, b:1}满足find({a:2}).sort({b:-1})的可能性实际上是有意义的.例如,索引{a:1, b:1}可以认为是:

The possibility of using the index {a:1, b:1} to satisfy find({a:2}).sort({b:-1}) actually make sense if you see it from a sorted table point of view. For example, the index {a:1, b:1} can be thought of as:

a | b
--|--
1 | 1
1 | 2
2 | 1
2 | 2
2 | 3
3 | 1
3 | 2

查找({a:2}).sort({b:1})

索引{a:1, b:1}表示sort by a, then within each a, sort the b values.如果随后执行find({a:2}).sort({b:1}),则索引会知道所有a=2的位置.在此a=2块中,b将按照升序排序(根据索引规范),因此查询find({a:2}).sort({b:1})可以通过以下方式满足:

The index {a:1, b:1} means sort by a, then within each a, sort the b values. If you then do a find({a:2}).sort({b:1}), the index knows where all the a=2 are. Within this block of a=2, the b would be sorted in ascending order (according to the index spec), so that query find({a:2}).sort({b:1}) can be satisfied by:

a | b
--|--
1 | 1
1 | 2
2 | 1 <-- walk this block forward to satisfy
2 | 2 <-- find({a:2}).sort({b:1})
2 | 3 <--
3 | 1
3 | 2

查找({a:2}).sort({b:-1})

由于索引可以向前或向后移动,因此遵循类似的过程,但在末尾稍作改动:

Since the index can be walked forward or backwards, a similar procedure was followed, with a small twist at the end:

a | b
--|--
1 | 1
1 | 2
2 | 1  <-- walk this block backward to satisfy
2 | 2  <-- find({a:2}).sort({b:-1})
2 | 3  <--
3 | 1
3 | 2

索引可以向前或向后走的事实是使查询find({a:2}).sort({b:-1})能够使用索引{a:1, b:1}的关键点.

The fact that the index can be walked forward or backward is the key point that enables the query find({a:2}).sort({b:-1}) to be able to use the index {a:1, b:1}.

您可以使用db.collection.explain().find(....)查看查询计划者的计划.基本上,如果您看到stageCOLLSCAN,则表明该索引未使用或可用于该查询.有关命令输出的详细信息,请参见解释结果.

You can see what the query planner plans by using db.collection.explain().find(....). Basically if you see a stage of COLLSCAN, no index was used or can be used for the query. See explain results for details on the command's output.

这篇关于MongoDB复合索引-排序顺序重要吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆