是否可以使用管道在 Mongo 中有效地进行排序、分组和限制? [英] Is it possible to sort, group and limit efficiently in Mongo with a pipeline?

查看:10
本文介绍了是否可以使用管道在 Mongo 中有效地进行排序、分组和限制?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给定用户的年龄索引:

{ name: 'Bob',
  age:   21  }

{ name: 'Cathy,
  age:   21  }

{ name: 'Joe',
  age:   33  }

获取输出:

[ 
  { _id: 21,
    names: ['Bob, 'Cathy'] },
  { _id: 33,
    names: ['Joe'] }
]

是否可以按年龄进行排序、分组和限制?

Is it possible to sort, group and limit by age?

db.users.aggregate(
   [  
      {
        $sort: { 
           age: 1 
        }
      },
      {
        $group : {
           _id : $age,
           names:{ $push: '$name' }
      },
      {
        $limit: 10
      }
  ]

我做了一些研究,但不清楚是否可以先排序然后分组.在我的测试中,该组失去了排序,但我不明白为什么.

I did some research, but it's not clear if it is possible to sort first and then group. In my testing, the group loses the sort, but I don't see why.

如果组保留排序,那么排序和限制可以大大减少所需的处理.它只需要做足够的工作来填充" 10 个组的限制.

If the group preserves the sort, then the sort and limit can greatly reduce the required processing. It only needs to do enough work to "fill" the limit of 10 groups.

所以,

  1. 组是否保留排序顺序?还是必须先分组再排序?
  2. 是否可以进行排序、分组和限制只做足够的处理来返回限制?还是需要先处理整个集合再进行限制?

推荐答案

回答你的第一个问题:$group 保留顺序.有一个开放的更改请求,其中也稍微突出了背景,但看起来不会更改产品以保留输入文档的顺序:

To answer your first question: $group does not preserve the order. There are a open requests for changes which also highlight the backgrounds a little but it doesn't look like the product will be changed to preserve the input documents' order:

通常可以说两件事:您通常希望先分组,然后再进行排序.原因是排序较少的元素(通常是分组产生的)比排序所有输入文档要快.

Two things can be said in general: You generally want to group first and then do the sorting. The reason being that sorting less elements (which the grouping generally produces) is going to be faster than sorting all input documents.

其次,MongoDB 将确保尽可能高效地进行排序.文档指出:

Secondly, MongoDB is going to make sure to sort as efficiently and little as possible. The documentation states:

当 $sort 紧接在管道中的 $limit 之前时,$sort操作仅在进行时保持前 n 个结果,其中 n是指定的限制,MongoDB只需要存储n个项目记忆.当 allowDiskUse 为 true 并且这 n 个项目超过了聚合内存限制.

When a $sort immediately precedes a $limit in the pipeline, the $sort operation only maintains the top n results as it progresses, where n is the specified limit, and MongoDB only needs to store n items in memory. This optimization still applies when allowDiskUse is true and the n items exceed the aggregation memory limit.

所以这段代码可以完成你的工作:

So this code gets the job done in your case:

collection.aggregate({
    $group: {
        _id: '$age',
        names: { $push: '$name' }
    }
}, {
    $sort: { 
        '_id': 1 
    }
}, {
    $limit: 10
})

编辑跟随您的评论:

我同意你说的.更进一步,我会说:如果 $group 足够聪明,可以使用索引,那么它甚至不需要 $sort开始阶段.不幸的是,它不是(可能还没有).按照今天的情况,$group 永远不会使用索引,也不会根据以下阶段(在本例中为 $limit)采取捷径.另请参阅此链接,其中有人进行了一些基本测试.

I agree to what you say. And taking your logic a little further, I would go as far as saying: If $group was smart enough to use an index then it shouldn't even require a $sort stage at the start. Unfortunately, it's not (not yet probably). As things stand today, $group will never use an index and it won't take shortcuts based on the following stages ($limit in this case). Also see this link where someone ran some basic tests.

聚合框架还很年轻,所以我想,要让聚合管道更智能、更快,还有很多工作要做.

The aggregation framework is still pretty young so I guess, there is a lot of work being done to make the aggregation pipeline smarter and faster.

在 StackOverflow 上有答案(例如 这里),人们建议使用预先 $sort 阶段以强制" MongoDB 以某种方式使用索引.然而,这显着减慢了我的测试(使用不同随机分布的样本形状的 100 万条记录).

There are answers here on StackOverflow (e.g. here) where people suggest to use an upfront $sort stage in order to "force" MongoDB to use an index somehow. This however, slowed down my tests (1 million records of your sample shape using different random distributions) significantly.

当谈到聚合管道的性能时,开始时的 $match 阶段是最有帮助的.如果您可以从一开始就限制需要通过管道的记录总数,那么这是您最好的选择 - 显然...... ;)

When it comes to performance of an aggregation pipeline, $match stages at the start are what really helps the most. If you can limit the total amount of records that need to go through the pipeline from the beginning then that's your best bet - obviously... ;)

这篇关于是否可以使用管道在 Mongo 中有效地进行排序、分组和限制?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆