MongoDB Count() 与聚合 [英] MongoDB Count() vs. Aggregation

查看:21
本文介绍了MongoDB Count() 与聚合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 mongo 中使用了很多聚合,我知道分组计数等方面的性能优势.但是,mongo 在计算集合中所有文档的这两种方法的性能上有什么不同吗?:

I've used aggregation in mongo a lot, I know performance benefits on the grouped counts and etc. But, do mongo have any difference in performance on those two ways to count all documents in a collection?:

collection.aggregate([
  {
    $match: {}
  },{
    $group: {
      _id: null, 
      count: {$sum: 1}
    }
}]);

collection.find({}).count()

更新:第二种情况:假设我们有这个样本数据:

Update: Second case: Let's say we have this sample data:

{_id: 1, type: 'one', value: true}
{_id: 2, type: 'two', value: false}
{_id: 4, type: 'five', value: false}

使用aggregate():

var _ids = ['id1', 'id2', 'id3'];
var counted = Collections.mail.aggregate([
  {
    '$match': {
      _id: {
        '$in': _ids
      },
      value: false
    }
  }, {
    '$group': {
      _id: "$type",
      count: {
        '$sum': 1
      }
    }
  }
]);

使用count():

var counted = {};
var type = 'two';
for (i = 0, len = _ids.length; i < len; i++) {
  counted[_ids[i]] = Collections.mail.find({
    _id: _ids[i], value: false, type: type
  }).count();
}

推荐答案

.count() 的速度要快得多.调用

.count() is by far faster. You can see the implementation by calling

// Note the missing parentheses at the end
db.collection.count

返回游标的长度.默认查询的(如果在没有查询文档的情况下调用 count()),这反过来被实现为返回 _id_ 索引的长度,iirc.

which returns the length of the cursor. of the default query (if count() is called with no query document), which in turn is implemented as returning the length of the _id_ index, iirc.

然而,聚合读取每个文档并对其进行处理.当仅对大约 100k 个文档(根据您的 RAM 给予和接受)执行此操作时,这只能与 .count() 处于相同数量级的一半.

An aggregation, however, reads each and every document and processes it. This can only be halfway in the same order of magnitude with .count() when doing it over only some 100k of documents (give and take according to your RAM).

下面的函数应用于一个包含大约 1200 万个条目的集合:

Below function was applied to a collection with some 12M entries:

function checkSpeed(col,iterations){

  // Get the collection
  var collectionUnderTest = db[col];

  // The collection we are writing our stats to
  var stats = db[col+'STATS']

  // remove old stats
  stats.remove({})

  // Prevent allocation in loop
  var start = new Date().getTime()
  var duration = new Date().getTime()

  print("Counting with count()")
  for (var i = 1; i <= iterations; i++){
    start = new Date().getTime();
    var result = collectionUnderTest.count()
    duration = new Date().getTime() - start
    stats.insert({"type":"count","pass":i,"duration":duration,"count":result})
  }

  print("Counting with aggregation")
  for(var j = 1; j <= iterations; j++){
    start = new Date().getTime()
    var doc = collectionUnderTest.aggregate([{ $group:{_id: null, count:{ $sum: 1 } } }])
    duration = new Date().getTime() - start
    stats.insert({"type":"aggregation", "pass":j, "duration": duration,"count":doc.count})
  }

  var averages = stats.aggregate([
   {$group:{_id:"$type","average":{"$avg":"$duration"}}} 
  ])

  return averages
}

然后返回:

{ "_id" : "aggregation", "average" : 43828.8 }
{ "_id" : "count", "average" : 0.6 }

单位是毫秒.

这篇关于MongoDB Count() 与聚合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆