MongoDB聚合-$ group按日期(即使不存在) [英] MongoDB aggregation - $group by date even if doesn't exist

查看:61
本文介绍了MongoDB聚合-$ group按日期(即使不存在)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经有一个查询,看起来像这样:

I have a query already which looks like this:

{$match:{
      "when":{$gt: new Date(ISODate().getTime() - 1000 * 60 * 60 * 24 * 30)}
}}, 
{$project:{
      "year":{$year:"$when"}, 
      "month":{$month:"$when"}, 
      "day": {$dayOfMonth:"$when"}
}}, 
{$group:{
      _id:{year:"$year", month:"$month", day:"$day"}, 
      "count":{$sum:1}
}},
{$sort:{
    _id: 1
}}

结果如下:

{ "_id" : { "year" : 2015, "month" : 10, "day" : 19 }, "count" : 1 }
{ "_id" : { "year" : 2015, "month" : 10, "day" : 21 }, "count" : 2 }

我如何以相同的格式获得结果,除了count为0之外的最近30天仍没有?

How could I get the result in the same format except having it for the last 30 days, even if count is 0?

赞:

{ "_id" : { "year" : 2015, "month" : 10, "day" : 01 }, "count" : 1 }
{ "_id" : { "year" : 2015, "month" : 10, "day" : 02 }, "count" : 2 }
{ "_id" : { "year" : 2015, "month" : 10, "day" : 03 }, "count" : 0 }
...
{ "_id" : { "year" : 2015, "month" : 10, "day" : 30 }, "count" : 2 }

推荐答案

与其尝试强制数据库返回不存在的数据的结果,而是一种更好的做法是生成查询外部的空白数据并将结果合并到其中.这样,您可以在没有数据的地方输入"0",并允许数据库返回那里的数据.

Rather than trying to force the database to return results for data that does not exist it is a better practice to generate the blank data external to the query and merge the results into them. In that way you have your "0" entries where there is no data and allow the database to return what is there.

合并是创建唯一键的哈希表并简单地替换该哈希表的聚合结果中找到的任何值的基本过程.在JavaScript中,基本对象非常适合,因为所有键都是唯一的.

Merging is a basic process of creating a hashed table of unique keys and simply replacing any of the values found in the aggregation results in that hash table. In JavaScript a basic object suits well as all keys are unique.

我还更喜欢通过使用日期数学来操纵日期并将其舍入"到所需的时间间隔,而不是使用日期聚合运算符,从而从聚合结果中实际返回Date对象.您可以使用 $subtract 将日期转换为日期来处理日期通过从另一个日期减去纪元日期值和 $mod 运算符以获取余数并将日期四舍五入到所需的时间间隔.

I also prefer to actually return a Date object from aggregation results by using date math to manipulate and "round" the date to the required interval rather than using the date aggregation operators. You can manipulate dates by using $subtract to turn the value into a numeric timestamp representation by subtracting from another date with the epoch date value, and the $mod operator to get the remainder and round the date to the required interval.

相比之下,使用 $add 具有相似的时代日期对象会将整数值转换回BSON日期.当然,直接处理 $group a>尽量不要使用单独的 $project 阶段只需将修改后的日期直接处理到分组_id值中即可.

In contrast using $add with a similar epoch date object will turn an integer value back into a BSON Date. And of course it is much more efficient to process directly to the $group rather than use a separate $project stage as you can just process the modified dates directly into the grouping _id value anyway.

作为外壳示例:

var sample = 30,
    Days = 30,
    OneDay = ( 1000 * 60 * 60 * 24 ),
    now = Date.now(),
    Today = now - ( now % OneDay ) ,
    nDaysAgo = Today - ( OneDay * Days ),
    startDate = new Date( nDaysAgo ),
    endDate = new Date( Today + OneDay ),
    store = {};

var thisDay = new Date( nDaysAgo );
while ( thisDay < endDate ) {
    store[thisDay] = 0;
    thisDay = new Date( thisDay.valueOf() + OneDay );
}

db.datejunk.aggregate([
    { "$match": { "when": { "$gte": startDate } }},
    { "$group": {
        "_id": {
            "$add": [
                { "$subtract": [
                    { "$subtract": [ "$when", new Date(0) ] },
                    { "$mod": [
                        { "$subtract": [ "$when", new Date(0) ] },
                        OneDay
                    ]}
                ]},
                new Date(0)
            ]
        },
        "count": { "$sum": 1 }
    }}
]).forEach(function(result){
    store[result._id] = result.count;
});

Object.keys(store).forEach(function(k) {
    printjson({ "date": k, "count": store[k] })
});

这将返回间隔中的所有天,包括没有数据的0值,例如:

Which will return all days in the interval including 0 values where no data exists, like:

{ "date" : "Tue Sep 22 2015 10:00:00 GMT+1000 (AEST)", "count" : 0 }
{ "date" : "Wed Sep 23 2015 10:00:00 GMT+1000 (AEST)", "count" : 1 }
{ "date" : "Thu Sep 24 2015 10:00:00 GMT+1000 (AEST)", "count" : 0 }
{ "date" : "Fri Sep 25 2015 10:00:00 GMT+1000 (AEST)", "count" : 1 }
{ "date" : "Sat Sep 26 2015 10:00:00 GMT+1000 (AEST)", "count" : 1 }
{ "date" : "Sun Sep 27 2015 10:00:00 GMT+1000 (AEST)", "count" : 0 }
{ "date" : "Mon Sep 28 2015 10:00:00 GMT+1000 (AEST)", "count" : 1 }
{ "date" : "Tue Sep 29 2015 10:00:00 GMT+1000 (AEST)", "count" : 1 }
{ "date" : "Wed Sep 30 2015 10:00:00 GMT+1000 (AEST)", "count" : 0 }
{ "date" : "Thu Oct 01 2015 10:00:00 GMT+1000 (AEST)", "count" : 1 }
{ "date" : "Fri Oct 02 2015 10:00:00 GMT+1000 (AEST)", "count" : 2 }
{ "date" : "Sat Oct 03 2015 10:00:00 GMT+1000 (AEST)", "count" : 0 }
{ "date" : "Sun Oct 04 2015 11:00:00 GMT+1100 (AEST)", "count" : 1 }
{ "date" : "Mon Oct 05 2015 11:00:00 GMT+1100 (AEDT)", "count" : 0 }
{ "date" : "Tue Oct 06 2015 11:00:00 GMT+1100 (AEDT)", "count" : 1 }
{ "date" : "Wed Oct 07 2015 11:00:00 GMT+1100 (AEDT)", "count" : 2 }
{ "date" : "Thu Oct 08 2015 11:00:00 GMT+1100 (AEDT)", "count" : 2 }
{ "date" : "Fri Oct 09 2015 11:00:00 GMT+1100 (AEDT)", "count" : 1 }
{ "date" : "Sat Oct 10 2015 11:00:00 GMT+1100 (AEDT)", "count" : 1 }
{ "date" : "Sun Oct 11 2015 11:00:00 GMT+1100 (AEDT)", "count" : 1 }
{ "date" : "Mon Oct 12 2015 11:00:00 GMT+1100 (AEDT)", "count" : 0 }
{ "date" : "Tue Oct 13 2015 11:00:00 GMT+1100 (AEDT)", "count" : 3 }
{ "date" : "Wed Oct 14 2015 11:00:00 GMT+1100 (AEDT)", "count" : 2 }
{ "date" : "Thu Oct 15 2015 11:00:00 GMT+1100 (AEDT)", "count" : 2 }
{ "date" : "Fri Oct 16 2015 11:00:00 GMT+1100 (AEDT)", "count" : 0 }
{ "date" : "Sat Oct 17 2015 11:00:00 GMT+1100 (AEDT)", "count" : 3 }
{ "date" : "Sun Oct 18 2015 11:00:00 GMT+1100 (AEDT)", "count" : 0 }
{ "date" : "Mon Oct 19 2015 11:00:00 GMT+1100 (AEDT)", "count" : 0 }
{ "date" : "Tue Oct 20 2015 11:00:00 GMT+1100 (AEDT)", "count" : 0 }
{ "date" : "Wed Oct 21 2015 11:00:00 GMT+1100 (AEDT)", "count" : 2 }
{ "date" : "Thu Oct 22 2015 11:00:00 GMT+1100 (AEDT)", "count" : 1 }

请注意,所有日期"值实际上仍然是BSON日期,但只是像.printjson()的输出中那样使用shell方法进行字符串化.

Noting that all "date" values are actually still BSON dates, but just stringify like that in the ouput from .printjson() as a shell method.

使用nodejs可以显示更简洁的示例,您可以在其中使用 async.parallel 同时处理哈希构造和聚合查询,以及 ,它使用与使用MongoDB集合相似的函数来实现哈希".它还显示了如果您还更改了从.aggregate():

A bit more concise example can be shown using nodejs where you can utilize operations like async.parallel to process both the hash construction and the aggregation query at the same time, as well as another useful utility in nedb which implements the "hash" using functions familiar to using a MongoDB collection. It also shows how this can scale for large results by using a real MongoDB collection if you also changed the handling to stream processing of the returned cursor from .aggregate():

var async = require('async'),
    mongodb = require('mongodb'),
    MongoClient = mongodb.MongoClient,
    nedb = require('nedb'),
    DataStore = new nedb();

// Setup vars
var sample = 30,
    Days = 30,
    OneDay = ( 1000 * 60 * 60 * 24 ),
    now = Date.now(),
    Today = now - ( now % OneDay ) ,
    nDaysAgo = Today - ( OneDay * Days ),
    startDate = new Date( nDaysAgo ),
    endDate = new Date( Today + OneDay );

MongoClient.connect('mongodb://localhost/test',function(err,db) {

  var coll = db.collection('datejunk');

  async.series(
    [
      // Clear test collection
      function(callback) {
        coll.remove({},callback)
      },

      // Generate a random sample
      function(callback) {
        var bulk = coll.initializeUnorderedBulkOp();

        while (sample--) {
          bulk.insert({
            "when": new Date(
              Math.floor(
                Math.random()*(Today-nDaysAgo+OneDay)+nDaysAgo
              )
            )
          });
        }
        bulk.execute(callback);
      },

      // Aggregate data and dummy data
      function(callback) {
        console.log("generated");
        async.parallel(
          [
            // Dummy data per day
            function(callback) {
              var thisDay = new Date( nDaysAgo );
              async.whilst(
                function() { return thisDay < endDate },
                function(callback) {
                  DataStore.update(
                    { "date": thisDay },
                    { "$inc": { "count": 0 } },
                    { "upsert": true },
                    function(err) {
                      thisDay = new Date( thisDay.valueOf() + OneDay );
                      callback(err);
                    }
                  );
                },
                callback
              );
            },
            // Aggregate data in collection
            function(callback) {
              coll.aggregate(
                [
                  { "$match": { "when": { "$gte": startDate } } },
                  { "$group": {
                    "_id": {
                      "$add": [
                        { "$subtract": [
                          { "$subtract": [ "$when", new Date(0) ] },
                          { "$mod": [
                            { "$subtract": [ "$when", new Date(0) ] },
                            OneDay
                          ]}
                        ]},
                        new Date(0)
                      ]
                    },
                    "count": { "$sum": 1 }
                  }}
                ],
                function(err,results) {
                  if (err) callback(err);
                  async.each(results,function(result,callback) {
                    DataStore.update(
                      { "date": result._id },
                      { "$inc": { "count": result.count } },
                      { "upsert": true },
                      callback
                    );
                  },callback);
                }
              );
            }
          ],
          callback
        );
      }
    ],
    // Return result or error
    function(err) {
      if (err) throw err;
      DataStore.find({},{ "_id": 0 })
        .sort({ "date": 1 })
        .exec(function(err,results) {
        if (err) throw err;
        console.log(results);
        db.close();
      });
    }
  );

});

这非常适合于图表数据.对于任何语言实现,基本过程都是相同的,并且最好在并行处理中完成以获得最佳性能,因此即使对于像这样的小样本,基本哈希表可以在内存中快速生成,异步或线程环境也可以给您带来真正的好处.您的环境需要顺序操作.

This is very suited to data for charts and graphs. The basic procedure is the same for any language implementation, and ideally done in parallel processing for best performance, so async or threaded environments give you a real bonus even though for a small sample like this the basic hash table can be generated in memory very quickly of your enviroment requires sequential opertions.

因此,请不要尝试强制数据库执行此操作.当然,有一些SQL查询的示例可以在数据库服务器上执行此合并"操作,但在那儿从来就不是一个好主意,应该真正使用类似的客户端"合并过程来处理它,因为这只会造成数据库开销,而实际上并没有. t必需.

So don'try and force the database to do this. There are certainly examples of SQL queries that do this "merge" on the database server, but it was never really a great idea there and should really be handled with a similar "client" merge process as it's just creating database overhead which really isn't required.

这一切都是非常有效和实用的,当然,它不需要在此期间的每一天都处理一个单独的聚合查询,这根本就没有效率.

It's all very efficient and practical to the purpose, and of course it does not require processing a separate aggregation query for each day in the period, which would not be efficient at all.

这篇关于MongoDB聚合-$ group按日期(即使不存在)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆