MongoDB 2.6索引设置,使用$ or,$ in进行查询,并带有限制和排序 [英] MongoDB 2.6 Index set up, query using $or, $in, with limit and sort

查看:112
本文介绍了MongoDB 2.6索引设置,使用$ or,$ in进行查询,并带有限制和排序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个稍微复杂的查询,这对我的应用程序非常重要.

I have a somewhat complex query that is very critical to my application.

$cur = $col->find(
    array (
        '$or' => array(
            array('owner' => $my_id),
            array('owner' => array('$in' => $friends), 'perm.type' => array('$in' => array('P', 'F'))),
            array('owner' => array('$in' => $friends), 'perm.list' => $my_id)
        )
    )
)->limit(10)->skip(0)->sort(array('ca' => -1));

目的是找到前10个帖子,并按其创建时间以降序排列:

The intention is to find the first 10 posts, sorted by their create time in desc order, which are:

a).由我自己制作,或 b).由我的朋友制作,公开类型为"P",朋友为"F",或者 C).由我的朋友制作的,权限列表已专门指定我为查看者.

a). made by myself, or b). made by my friends with permission types of 'P' for public, or'F' for friends, or c). made by my friends which the permission list has specifically designated me as a viewer.

变量$ friends是与我成为朋友的用户ID的数组. perm.type共有4个值,分别是'P','F','S','C'. perm.list是一组有权查看此帖子的用户ID.

The variable $friends is an array of user ids who are friends with me. perm.type has a total of 4 values, which are 'P', 'F', 'S', 'C'. perm.list is an array of user ids who has permission to view this post.

以上查询可按预期方式过滤出正确的结果.但是我遇到了在它们上创建有效索引的问题.

The above query works as intended in filtering out the correct results. But I ran into problem creating effective indexes on them.

我为此查询创建的索引是:

The indexes I have created for this query are:

$col->ensureIndex(array('owner' => 1, 'ca' => -1));
$col->ensureIndex(array('owner' => 1, 'perm.type' => 1, 'ca' => -1));
$col->ensureIndex(array('owner' => 1, 'perm.list' => 1, 'ca' => -1));

第一个索引是为查询条件的第一部分设计的,第二个索引是为第二个条件设计的,第三个索引是为第三个条件设计的,是多键索引.

The first index is designed for the first part of the query criteria, the 2nd index is designed for the 2nd criteria, and the 3rd is design for the 3rd criteria, and is a multikey index.

一个典型的帖子看起来像这样:

A typical post would look like this:

{
    "_id": "...",
    "owner": "001",
    "perm": {
        "type": "P",
        "list": []
    },
    "msg": "Nice dress!",
    "ca": 1390459269
}

另一个例子:

{
    "_id": "...",
    "owner": "007",
    "perm": {
        "type": "C",
        "list": ["001", "005"]
    },
    "msg": "Nice day!",
    "ca": 1390837209
}

我知道MongoDB 2.6版之前存在的局限性,这会阻止在将$ or与sort()结合使用时使用索引.根据 http://jira.mongodb.org/browse/SERVER-1205

I know of the limitation that existed prior to MongoDB version 2.6, which prevents the indexes being used when combining $or with sort(). The issue according to this http://jira.mongodb.org/browse/SERVER-1205 should have been fixed in 2.6.

可以肯定的是,explain()现在可以显示我的索引的使用,而在2.4版本中则没有.但是,当我运行查询时,它现在比不使用任何索引时要慢得多. explain()表明nscanned比预期的要高.经过一番搜索,我发现此问题 https://jira.mongodb.org/browse/SERVER-3310 似乎可以解释我遇到的问题.但是如票证所示,这个问题应该已经在2.5.5中解决,那么是什么导致我的问题呢?

And sure enough, explain() now shows the use of my indexes, where it didn't before in 2.4. But when I ran the query, it is now much slower than when it didn't use any indexes. explain() showed the nscanned is way higher than expected. After some searching, I found this issue https://jira.mongodb.org/browse/SERVER-3310 which seems to explain the problem I am experiencing. But as the ticket stated, this issue should have been fixed in 2.5.5, so what is causing my problem here?

我尝试设置不同的索引,以不同的顺序将它们混合,甚至将它们分开,然后检查新的索引交集功能是否会有所帮助.但是没有一个.

I have tried to set up different indexes, compounding them in different orders, even separating them up, checking to see if the new index intersection feature would help. But none worked.

有人知道我的问题在哪里吗?

Does anyone know what my problem here is?

修改 经过更多的测试,观察和思考,我缩小了范围,实际上是在一个导致问题的查询中同时使用了$ in,limit()和sort().对于每个"$ or"子句,添加顶级"$ or"只会使此问题加倍.我将在下面解释我的逻辑:

Edit After more testing, observing, and thinking, I have narrowed downed the issue, and it is really using $in, limit() and sort() all together in one query that's causing the problem. Adding a top level '$or' just doubles this problem for each '$or' clause. I will explain my logic below:

我将索引细化为以下内容:

I have refined the my indexes to the following:

$col->ensureIndex(array('owner._id' => 1, 'ca' => -1, 'perm.type' => 1));
$col->ensureIndex(array('perm.list' => 1, 'ca' => -1, 'owner._id' => 1))

第一个索引的原因是当我有数百万条记录时,查询应首先从给定的用户ID(朋友)集中查找,以缩小选择范围.然后,它按照记录的倒序顺序遍历它,以检查每个记录是否具有正确的权限类型.该索引的问题在于查询优化器不知道要满足limit(10)条件需要扫描多少条记录.它不知道最近的10条记录最终将来自何处,因此它必须从'$ in'子句中指定的每个ID返回最多10条记录,然后对每个'$ or'重复相同的操作.因此,如果我有两个"$ or"子句,每个子句都有一个"$ in",该子句由100个用户ID组成,则它将必须扫描足够的记录以匹配来自"$ in"子句中每个用户的10条记录.第一个"$ or",然后是第二个"$ or"的"$ in"中每个用户的10条记录,返回2000条记录(这是解释中返回的n,nscanned会更高取决于需要扫描多少条记录才能找到2000条匹配项,并且从这2000条记录中,所有记录均已按时间顺序排序,因此返回前10位.

The reasoning behind the first index is when I have millions of records, the query should start looking from the given set of user ids (friends) first to narrow down the choices. Then it goes through it in reverse chronological order of the records to check if each has the right permission type. The problem with this index is that the query optimizer has no idea of how many records it needs to scan in order to satisfy the limit(10) condition. It has no idea where the 10 most recent records will eventually come from, so it has to return up to 10 records from each id specified in the '$in' clause, then repeat the same thing for each '$or'. So if I have two '$or' clauses, each with an '$in' that consist of 100 user ids, then it will have to scan enough records to match 10 records from each of the users in the '$in' of the first '$or', then and 10 records from each of the users in the '$in' of the 2nd '$or', giving a return of 2000 records (this is the n returned in explain, and nscanned will be much higher depending on how many records it needs to scan to find the 2000 matches), and from this 2000 records, all chronologically ordered already, it takes the top 10 to return.

那么,如果我按以下顺序构建索引:'ca'=> -1,'owner._id'=> 1,'perm.type'=> 1",该怎么办?好吧,我真的不能做到这一点,因为当我有成千上万的用户,拥有数百万条记录时,大多数记录与查看者无关.因此,如果我先从'ca'=> -1开始,它会先扫描许多不相关的记录,然后再命中一个符合条件的记录,即使它发现的每个命中都将直接计入limit(10),并且它将只需要扫描尽可能多的记录即可匹配10个符合条件的记录.但是这种扫描可以是成千上万条记录,甚至更多.最糟糕的是,如果找不到10条记录,则必须遍历整个索引才能找到答案.

So, what if I build the index in the following order: "'ca' => -1, 'owner._id' => 1, 'perm.type' => 1"? Well, I can't really do that, because when I have hundreds of thousands of users, with millions of records, most records will be irrelevant to the viewer. So if I start from 'ca' => -1 first, it will scan a lot of irrelevant records before hitting one that fits the criteria, even though each hit that it founds will count directly against the limit(10), and it will only need to scan as many records as it needs to match 10 records that meet the criteria. But this scan can be tens of thousands of records, or even more. Worst yet, if 10 records can't be found, it will have to go through the entire index to find this out.

第二个索引是查看为我指定的每条记录,以相反的时间顺序浏览它,并检查它们是否来自我的朋友.这是非常简单的,这里的问题实际上是通过结合使用,以及从上面一起使用'$ in',limit()和sort()来实现的.

The 2nd index is to look at each record that is designated for me, go through it in reverse chronological order, and check and see if these come from my friends. This is pretty straight forward, and the problem here really is from the combination of using this, with '$in', limit() and sort() from above, all together in one query.

这时,我正在从应用程序端合并结果中寻找解决方案,但是在应用程序端分解"$ or"很容易,但是如何在应用程序端分解"$ in"条件array('owner'=> array('$ in'=> $ friends),'perm.type'=> array('$ in'=> array('P','F')))) >

At this point, I am looking into solutions from merging results on the application side, but breaking up the '$or' to do on the application side is easy, but how do I break up the '$in' in the criteria array('owner' => array('$in' => $friends), 'perm.type' => array('$in' => array('P', 'F')))?

推荐答案

经过3天的测试和研究,现在可以确定导致查询效率低下的原因.当前版本(2.6.1)的MongoDB仍然无法优化一次使用$ or,$ in,limit()和sort()的查询. https://jira.mongodb.org/browse/SERVER-1205 https://jira.mongodb.org/browse/SERVER-3310 修复程序,每个修复程序仅提高了性能对上面列出的4个操作中有3个的查询.在查询中引入第4个操作时,优化超出了预期.即使指定了limit(10),也会在$ or子句中使用完整索引和文档扫描来观察到此行为.

After 3 days of testing and research, the reason that is causing the inefficient queries is now clear. MongoDB at current version (2.6.1) is still unable to optimize queries that uses $or, $in, limit() and sort() all at once. The https://jira.mongodb.org/browse/SERVER-1205 and https://jira.mongodb.org/browse/SERVER-3310 fixes, each only improved performance on queries having 3 out of the 4 operations listed above. When introducing a 4th operation into the query, the optimization goes out the window. This behavior is observed with full index and document scans within the $or clause, even though limit(10) is specified.

通过单独分解$ or子句并在应用程序端合并结果来解决此问题的尝试,虽然可行,但在尝试实现分页时遇到了主要障碍.

The attempt to solve this problem by breaking up the $or clauses individually and merge results on the application side, while feasible, ran into major obstacles when I attempted to implement pagination.

因此,我当前的解决方案是提出与原始查询等效的查询,而仅使用4个操作中的3个.我决定展平""$ in"运算符,将$ friends数组中的每个元素转换为另一个"$ or"条件,并查询其确切所有者值.因此,现在在我的原始查询中不再有3个"$ or"条件,而是拥有与$ friends数组中的元素一样多的"$ or"条件,以及另外两个原始的"$ or"条件.

My current solution thus, is to come up with an equivalent query to the original query, while only using 3 out of the 4 operations. I decided to 'flatten' the '$in' operator, turn each element within the $friends array into another '$or' condition with an exact owner value to be queried for. So instead of having 3 '$or' conditions in my original query, I now have as many '$or' conditions as I have elements in my $friends array, plus the 2 other original '$or' conditions.

查询现在已优化.当与explain()一起运行时,nscannedObjects和nscanned现在降到了它们假定的值.考虑有关"$ or"说明的文档

The query is now optimized. When ran with explain(), the nscannedObjects, and nscanned is now way down, to values they are suppose to be. Considering the documentation on '$or' stating

在$ or查询中使用索引时,$ or的每个子句将 并行执行.这些子句可以各自使用自己的索引.

When using indexes with $or queries, each clause of an $or will execute in parallel. These clauses can each use their own index.

这实际上可能是可接受的解决方案性能. 我希望这会对遇到我同样问题的人有所帮助.

This may actually be an acceptable solution performance-wise. I hope this will helps anyone who ran into the same problems I did.

这篇关于MongoDB 2.6索引设置,使用$ or,$ in进行查询,并带有限制和排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆