时间间隔内的Mongo聚合 [英] Mongo aggregation within intervals of time

查看:87
本文介绍了时间间隔内的Mongo聚合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些日志数据存储在mongo集合中,其中包括基本信息(如request_id以及将其添加到集合中的时间),例如:

I have some log data stored in a mongo collection that includes basic information as a request_id and the time it was added to the collection, for example:

{
    "_id" : ObjectId("55ae6ea558a5d3fe018b4568"),
    "request_id" : "030ac9f1-aa13-41d1-9ced-2966b9a6g5c3",
    "time" : ISODate("2015-07-21T16:00:00.00Z")
}

我想知道是否可以使用聚合框架来聚合一些统计数据.我想获取最近X个小时的每N分钟间隔内创建的对象的计数.

I was wondering if I could use the aggregation framework to aggregate some statistical data. I would like to get the counts of the objects created within each interval of N minutes for the last X hours.

所以我在过去1个小时需要间隔10分钟的输出应该类似于以下内容:

So the output which I need for 10 minutes intervals for the last 1 hour should be something like the following:

{ "_id" : 0, "time" : ISODate("2015-07-21T15:00:00.00Z"), "count" : 67 }
{ "_id" : 0, "time" : ISODate("2015-07-21T15:10:00.00Z"), "count" : 113 }
{ "_id" : 0, "time" : ISODate("2015-07-21T15:20:00.00Z"), "count" : 40 }
{ "_id" : 0, "time" : ISODate("2015-07-21T15:30:00.00Z"), "count" : 10 }
{ "_id" : 0, "time" : ISODate("2015-07-21T15:40:00.00Z"), "count" : 32 }
{ "_id" : 0, "time" : ISODate("2015-07-21T15:50:00.00Z"), "count" : 34 }

我会用它来获取图形数据.

I would use that to get data for graphs.

任何建议都值得赞赏!

推荐答案

根据最适合您需要的输出格式,有几种解决方法.主要说明是聚合框架" 本身,您实际上无法返回"cast"作为日期,但是当在API中处理结果时,您可以获得易于重构为Date对象的值.

There are a couple of ways of approaching this depending on which output format best suits your needs. The main note is that with the "aggregation framework" itself, you cannot actually return something "cast" as a date, but you can get values that are easily reconstructed into a Date object when processing results in your API.

第一种方法是使用日期聚合运算符" 可用于聚合框架:

The first approach is to use the "Date Aggregation Operators" available to the aggregation framework:

db.collection.aggregate([
    { "$match": {
        "time": { "$gte": startDate, "$lt": endDate }
    }},
    { "$group": {
        "_id": {
            "year": { "$year": "$time" },
            "dayOfYear": { "$dayOfYear": "$time" },
            "hour": { "$hour": "$time" },
            "minute": {
                "$subtract": [
                    { "$minute": "$time" },
                    { "$mod": [ { "$minute": "$time" }, 10 ] }
                ]
            }
        },
        "count": { "$sum": 1 }
    }}
])

哪个返回_id的组合键,其中包含您想要的日期"的所有值.或者,如果总是在一个小时"之内,则只需使用分钟"部分,然后根据范围选择的startDate计算出实际日期.

Which returns a composite key for _id containing all the values you want for a "date". Alternately if just within an "hour" always then just use the "minute" part and work out the actual date based on the startDate of your range selection.

或者您也可以只使用简单的日期数学"来获取自纪元"以来的毫秒数,该毫秒数可以再次直接输入到日期构造器中.

Or you can just use plain "Date math" to get the milliseconds since "epoch" which can again be fed to a date contructor directly.

db.collection.aggregate([
    { "$match": {
        "time": { "$gte": startDate, "$lt": endDate }
    }},
    { "$group": {
        "_id": {
            "$subtract": [
               { "$subtract": [ "$time", new Date(0) ] },
               { "$mod": [
                   { "$subtract": [ "$time", new Date(0) ] },
                   1000 * 60 * 10
               ]}
            ]
        },
        "count": { "$sum": 1 }
    }}
])

在所有情况下,您不想要做的是使用 $project ,然后实际应用

In all cases what you do not want to do is use $project before actually applying $group. As a "pipeline stage", $project must "cycle" though all documents selected and "transform" the content.

这将花费时间,并将其添加到查询的执行总数中.您可以直接将$group应用于已显示的内容.

This takes time, and adds to the execution total of the query. You can simply just apply to the $group directly as has been shown.

或者,如果您对返回的Date对象确实是纯"的,而无需进行后期处理,则可以始终使用,因为JavaScript函数实际上允许将日期重铸为日期,但比聚合框架要慢,当然也没有游标响应:

Or if you are really "pure" about a Date object being returned without post processing, then you can always use "mapReduce", since the JavaScript functions actually allow recasting as a date, but slower than the aggregation framework and of course without a cursor response:

db.collection.mapReduce(
   function() {
       var date = new Date(
           this.time.valueOf() 
           - ( this.time.valueOf() % ( 1000 * 60 * 10 ) )
       );
       emit(date,1);
   },
   function(key,values) {
       return Array.sum(values);
   },
   { "out": { "inline": 1 } }
)

您最好的选择还是使用聚合,因为转换响应非常容易:

Your best bet is using aggregation though, as transforming the response is quite easy:

db.collection.aggregate([
    { "$match": {
        "time": { "$gte": startDate, "$lt": endDate }
    }},
    { "$group": {
        "_id": {
            "year": { "$year": "$time" },
            "dayOfYear": { "$dayOfYear": "$time" },
            "hour": { "$hour": "$time" },
            "minute": {
                "$subtract": [
                    { "$minute": "$time" },
                    { "$mod": [ { "$minute": "$time" }, 10 ] }
                ]
            }
        },
        "count": { "$sum": 1 }
    }}
]).forEach(function(doc) {
    doc._id = new Date(doc._id);
    printjson(doc);
})

然后将间隔分组输出与实际的Date对象一起

And then you have your interval grouping output with real Date objects.

这篇关于时间间隔内的Mongo聚合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆