蒙哥三元复合指数 [英] Mongo Triple Compound Index

查看:106
本文介绍了蒙哥三元复合指数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果您有双复合索引{a:1,b:1},那么对我来说有意义的是,如果仅在 b 上查询,则不会使用该索引(即,您不能在查询中跳过" a ).但是,如果仅在 a 上进行查询,则将使用该索引.

但是,给定三元复合索引{a:1,b:1,c:1}我的说明命令显示了在 a c上查询时使用该索引(即您可以在查询中跳过" b ).

Mongo如何在查询 ac 的查询中使用 abc 索引?在这种情况下,索引的有效性如何?

背景:

我的用例是,有时我想在a,b,c上查询,有时我想在a,c上查询.现在我应该只在a,b,c上创建一个索引还是应该在a,c上创建一个索引,在a,b,c上创建一个索引?

(在a,c,b上创建索引没有意义,因为 c 是具有良好选择性的多键索引.)

解决方案

底行/tl; dr:如果查询了ac,则可以跳过"索引b表示相等或不相等,但不表示例如c上的排序.

这是一个很好的问题.不幸的是,我找不到任何能更详细地回答这一问题的东西.我相信在过去几年中此类查询的性能有所提高,所以我不相信有关该主题的旧材料.

整个过程非常复杂,因为它取决于索引的选择性以及是否查询相等性,不平等性和/或排序,因此explain()是您的唯一朋友,但是我发现了一些东西:

注意事项 :现在出现的是实验结果,推理和猜测的混合体.我可能将凯尔(Kyle)的类推范围推得太远,甚至我什至可能完全错了(而且很不幸,因为我的测试结果与我的推理不符).

很明显,可以使用A的索引,这取决于A的选择性,无疑是非常有用的. 跳过" B可能很棘手,也可能很难.让我们将此与 Kyle的食谱示例保持相似. :

French
    Beef
        ...
    Chicken
        Coq au Vin
        Roasted Chicken
    Lamb
        ...
    ...

如果您现在要我找到一些名为"Chateaubriand"的法国菜,我可以使用索引A,并且由于我不知道成分,因此必须扫描A中的所有菜.另一方面,我确实知道每个类别中的菜肴列表都是通过索引C进行排序的,因此,我只需要在每个成分列表中查找以"Cha"开头的字符串.如果有50种成分,我将需要进行50次查找而不是仅仅一次查找,但这比必须扫描每道法国菜要好得多!

在我的实验中,该数量比b中的不同值的数量小得多:它似乎从未超过2.但是,我只对单个集合进行了测试,这可能与b-索引的选择性.

但是,如果您要我给您提供所有法国菜的按字母顺序排序的列表,那么我会很麻烦.现在C上的索引一文不值,我必须对所有这些索引列表进行合并排序.我将必须扫描每个元素.

这反映在我的测试中.这是一些简化的结果.原始集合具有日期时间,整数和字符串,但是我想让事情保持简单,所以现在全都是整数.

基本上,只有两类查询:nscanned< = 2 * limit的那些,以及必须扫描整个集合(120k个文档)的那些.索引是{a, b, c}:

// fast (range query on c while skipping b)
> db.Test.find({"a" : 43, "c" : { $lte : 45454 }});
// slow (sorting)
> db.Test.find({"a" : 43, "c" : { $lte : 45454 }}).sort({ "c" : -1});
> db.Test.find({"a" : 43, "c" : { $lte : 45454 }}).sort({ "b" : -1}); 

// fast (can sort on c if b included in the query)
> db.Test.find({"a" : 43, "b" : 7887, "c" : { $lte : 45454 }}).sort({ "c" : -1});

// fast (older tutorials claim this is slow)
> db.Test.find({"a" : {$gte : 43}, "c" : { $lte : 45454 }});

您的里程会有所不同.

If you have a double compound index { a : 1, b : 1}, it makes sense to me that the index won't be used if you query on b alone (i.e. you cannot "skip" a in your query). The index will however be used if you query on a alone.

However, given a triple compound index { a : 1, b: 1, c: 1} my explain command is showing that the index is used when you query on a and c (i.e. you can "skip" b in your query).

How can Mongo use an abc index on a query for ac, and how effective is the index in this case?

Background:

My use case is that sometimes I want to query on a,b,c and sometimes I want to query on a,c. Now should I create only 1 index on a,b,c or should I create one on a,c and one on a,b,c?

(It doesn't make sense to create an index on a,c,b because c is a multi-key index with good selectivity.)

解决方案

bottom line / tl;dr: Index b can be 'skipped' if a and c are queried for equality or inequality, but not, for instance, for sorts on c.

This is a very good question. Unfortunately, I couldn't find anything that authoritatively answers this in greater detail. I believe the performance of such queries has improved over the last years, so I wouldn't trust old material on the topic.

The whole thing is quite complicated because it depends on the selectivity on your indexes and whether you query for equality, inequality and/or sort, so explain() is your only friend, but here are some things I found:

Caveat: What comes now is a mixture of experimental results, reasoning and guessing. I might be stretching Kyle's analogy too far, and I might even be completely wrong (and unlucky, because my test results loosely match my reasoning).

It is clear that the index of A can be used, which, depending on the selectivity of A, is certainly very helpful. 'Skipping' B can be tricky, or not. Let's keep this similar to Kyle's cookbook example:

French
    Beef
        ...
    Chicken
        Coq au Vin
        Roasted Chicken
    Lamb
        ...
    ...

If you now ask me to find some French dish called "Chateaubriand", I can use index A and, because I don't know the ingredient, will have to scan all dishes in A. On the other hand, I do know that the list of dishes in each category is sorted through the index C, so I will only have to look for the strings starting with, say, "Cha" in each ingredient-list. If there are 50 ingredients, I will need 50 lookups instead of just one, but that is a lot better than having to scan every French dish!

In my experiments, the number was a lot smaller than the number of distinct values in b: it never seemd to exceed 2. However, I tested this only with a single collection, and it probably has to do with the selectivity of the b-index.

If you asked me to give you an alphabetically sorted list of all French dishes, though, I'd be in trouble. Now the index on C is worthless, I'd have to merge-sort all those index lists. I will have to scan every element to do so.

This reflects in my tests. Here are some simplified results. The original collection has datetimes, ints and strings, but I wanted to keep things simple, so it's now all ints.

Essentially, there are only two classes of queries: those where nscanned <= 2 * limit, and those that have to scan the entire collection (120k documents). The index is {a, b, c}:

// fast (range query on c while skipping b)
> db.Test.find({"a" : 43, "c" : { $lte : 45454 }});
// slow (sorting)
> db.Test.find({"a" : 43, "c" : { $lte : 45454 }}).sort({ "c" : -1});
> db.Test.find({"a" : 43, "c" : { $lte : 45454 }}).sort({ "b" : -1}); 

// fast (can sort on c if b included in the query)
> db.Test.find({"a" : 43, "b" : 7887, "c" : { $lte : 45454 }}).sort({ "c" : -1});

// fast (older tutorials claim this is slow)
> db.Test.find({"a" : {$gte : 43}, "c" : { $lte : 45454 }});

Your mileage will vary.

这篇关于蒙哥三元复合指数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆