使用mongodb聚合框架计算频率 [英] calculate frequency using mongodb aggregate framework

查看:94
本文介绍了使用mongodb聚合框架计算频率的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试基于10秒间隔来计算数据库中文档的频率。

I'm trying to calculate frequency of documents in my db based on 10 seconds intervals.

这是我的数据库对象的样子:

this is how my database objects look like:

[
  {
     created_at: "2014-03-31T22:30:48.000Z",
     id: 450762158586880000,
     _id: "5339ec9808eb125965f2eae1"
  },
  {
     created_at: "2014-03-31T22:30:48.000Z",
     id: 450762160407597060,
     _id: "5339ec9808eb125965f2eae2"
  },
  {
     created_at: "2014-03-31T22:30:49.000Z",
     id: 450762163482017800,
     _id: "5339ec9908eb125965f2eae3"
  },
  {
     created_at: "2014-03-31T22:30:49.000Z",
     id: 450762166367707140,
     _id: "5339ec9908eb125965f2eae4"
  },
  {
     created_at: "2014-03-31T22:30:50.000Z",
     id: 450762167412064260,
     _id: "5339ec9a08eb125965f2eae5"
  }
]

我设法显示给定的频率间隔,但我希望每隔10秒就能得到一次。所以我的JSON最好是这样的:

I have managed to display the frequency in the given interval, but I would like to get that for every 10 seconds. So preferably my JSON would look like:

[
  {
     time_from: "2014-03-31T22:30:48.000Z",
     time_to: "2014-03-31T22:30:58.000Z",
     count: 6
  },
  {
     time_from: "2014-03-31T22:30:58.000Z",
     time_to: "2014-03-31T22:31:08.000Z",
     count: 3
  },
  {
     time_from: "2014-03-31T22:31:08.000Z",
     time_to: "2014-03-31T22:31:18.000Z",
     count: 10
  },
  {
     time_from: "2014-03-31T22:31:18.000Z",
     time_to: "2014-03-31T22:31:28.000Z",
     count: 1
  },
  {
     time_from: "2014-03-31T22:31:28.000Z",
     time_to: "2014-03-31T22:31:38.000Z",
     count: 3
  }
]

这就是我到目前为止已经完成了:

this is what I have done so far:

exports.findAll = function (req, res) {
    db.collection(collection_name, function (err, collection) {
        collection.find().toArray(function (err, items) {
            collection.find().sort({"_id": 1}).limit(1).toArray(function (err, doc) {
                var interval = 100000; // in milliseconds
                var startTime = doc[0].created_at;
                var endTime = new Date(+startTime + interval);

                collection.aggregate([
                    {$match: {"created_at": {$gte: startTime, $lt: endTime}}},
                    {$group: {"_id": 1, "count":{$sum: 1}}}
                ], function(err, result){
                    console.log(result);
                    res.send(result);
                });
            });
        })
    });
};

这就是结果:

[
  {
     _id: 1,
     count: 247
  }
]

编辑:

collection.aggregate([
                    { $group: {
                        _id: {
                            year: { '$year': '$created_at'},
                            month: {'$month': '$created_at'},
                            day: {'$dayOfMonth': '$created_at'},
                            hour: {'$hour': '$created_at'},
                            minute: {'$minute': '$created_at'},
                            second: {'$second': '$created_at'}
                        },
                        count: { $sum : 1 }
                    } }
                ], function (err, result) {
                    console.log(result);
                    res.send(result);
                });

导致:

[
  {
     _id: {
        year: 2014,
        month: 3,
        day: 31,
        hour: 22,
        minute: 37,
        second: 10
     },
     count: 6
  }, ...

新的进度,现在我将如何以10秒的间隔显示它?

new progress, now how would I display it in 10 seconds interval?

推荐答案

如果它只是在10秒的间隔内得到东西,你可以做一些数学并通过聚合运行:

If it just about getting things within 10 second intervals, you can do a little math and run this through aggregate:

db.collection.aggregate([
    { "$group": {
        "_id": {
             "year": { "$year": "$created_at" },
             "month":{ "$month": "$created_at" },
             "day": { "$dayOfMonth": "$created_at" },
             "hour": { "$hour": "$created_at" },
             "minute": { "$minute": "$created_at" },
             "second": { "$subtract": [
                 { "$second": "$created_at" },
                 { "$mod": [
                     { "$second": "$created_at" },
                     10
                 ]}
             ]}
        },
        "count": { "$sum" : 1 }
    }}
])

因此,他们会在一分钟内将事情分解为10秒发生一点mod 10数学。

So that breaks things down to the intervals of 10 seconds in a minute where they occur with a little mod 10 math.

我认为这是合理的,并且因为它使用聚合而成为最快的跑步者。如果你真的需要你的序列从最初匹配的时间开始运行10秒,那么你可以用mapReduce完成这个过程:

I think that is reasonable, and would be the fastest runner since it uses aggregate. If you really need your sequence as shown to be a running 10 seconds from an initially matched time, then you can do the process with mapReduce:

首先是一个映射器:

var mapper = function () {

    if ( this.created_at.getTime() > ( last_date + 10000 ) ) {
        if ( last_date == 0 ) {
            last_date = this.created_at.getTime();
        } else {
            last_date += 10000;
        }
    }

    emit(
        {
            start: new Date( last_date ),
            end: new Date( last_date + 10000 )
        },
        this.created_at
    );

}

所以这将在10秒钟内发出日期,从第一个日期开始,然后每次发现某个范围超出范围时增加间隔

So this is going to emit dates within a 10 second interval, starting with the first date and then increasing the interval each time something is found out of range

现在你需要一个减速器:

Now you need a reducer:

var reducer = function (key, values) {
    return values.length;
};

非常简单。只返回传入的数组的长度。

Very simple. Just return the length of the array passed in.

因为mapReduce以它的方式工作,所以任何没有多个值的东西都不会传递给reducer,所以用finalize清理它:

Because mapReduce works the way it does, anything that did not have more than one value is not passed to the reducer, so clean this up with finalize:

var finalize = function (key, value) {
    if ( typeof(value) == "object" ) {
        value = 1;
    }
    return value;
};

然后只需运行它即可获得结果。注意传递要在映射器中使用的全局变量的scope部分:

Then just run it to get the results. Note the "scope" section that passes a global variable to be used in the mapper:

db.collection.mapReduce(
    mapper,
    reducer,
    { 
        "out": { "inline": 1 }, 
        "scope": { "last_date": 0 }, 
        "finalize": finalize 
    }
)

每种方法都可能会产生略微不同的结果,但这就是重点。这取决于你真正想要使用哪一个。

Each approach is likely to give slightly different results, but that is the point. It depends on which one you actually want to use.

考虑到你的评论你可以检查任何一个的输出声明并以编程方式填补空白。我通常更喜欢这个选项,但它不是我的程序,我不知道你试图从这个查询中检索的系列有多大。

Considering your comment you could either "inspect" the output from either statement and "fill in the gaps" programatically as it were. I do generally prefer that option, but It's not my program and I do not know how large a series you are trying to retrieve from this query.

在服务器端,你可以修补mapper来做这样的事情:

On the server side, you can patch up the "mapper" to do something such as this:

var mapper = function () {

    if ( this.created_at.getTime() > ( last_date + 10000 ) ) {

        if ( last_date == 0 ) {
            last_date = this.created_at.getTime();
        } else {
            // Patching for empty blocks
            var times = Math.floor( 
                 ( this.created_at.getTime() - last_date ) / 10000
            );

            if ( times > 1 ) {
                for ( var i=1; i < times; i++ ) {
                    last_date += 10000;
                    emit(
                        {
                            start: new Date( last_date ),
                            end: new Date( last_date + 10000 )
                        },
                        0
                    );
                }
            }
            // End patch
            last_date += 10000;
        }
    }

    emit(
        {
            start: new Date( last_date ),
            end: new Date( last_date + 10000 )
        },
        this.created_at
    );

}

这篇关于使用mongodb聚合框架计算频率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆