时间序列和汇总框架(Mongo) [英] time series and aggregation framework (mongo)

查看:100
本文介绍了时间序列和汇总框架(Mongo)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试同步我在应用程序中运行的两个功能. 第一个实时检查每个时间段(例如,每10秒)我保存到MongoDB中的文档数:

I'm trying to synchronise two functions I run in my app. First one checks the count of the documents I save to MongoDB every time block (e.g. every 10 seconds) in the real time:

var getVolume = function(timeBlock, cb) {
    var triggerTime = Date.now();
    var blockPeriod = triggerTime - timeBlock;

    Document.find({
        time: { $gt: blockPeriod }
    }).count(function(err, count) {
        log('getting volume since ', new Date(blockPeriod), 'result is', count)
        cb(triggerTime, count);
    });
};

然后有第二个函数,只要我想获取图形数据(前端),就可以使用该函数:

and then I have the second function which I use whenever I want to get a data for my graph (front end):

var getHistory = function(timeBlock, end, cb) {

    Document.aggregate(
    {
        $match: {
            time: {
                $gte: new Date(end - 10 * timeBlock),
                $lt: new Date(end)
            }
        }
    },

    // count number of documents based on time block
    // timeBlock is divided by 1000 as we use it as seconds here
    // and the timeBlock parameter is in miliseconds
    {
        $group: {
            _id: {
                year: { $year: "$time" },
                month: { $month: "$time" },
                day: { $dayOfMonth: "$time" },
                hour: { $hour: "$time" },
                minute: { $minute: "$time" },
                second: { $subtract: [
                    { $second: "$time" },
                    { $mod: [
                        { $second: "$time" },
                        timeBlock / 1000
                    ]}
                ]}
            },
            count: { $sum: 1 }
        }
    },

    // changing the name _id to timeParts
    {
        $project: {
            timeParts: "$_id",
            count: 1,
            _id: 0
        }
    },

    // sorting by date, from earliest to latest
    {
        $sort: {
            "time": 1
        }
    }, function(err, result) {
        if (err) {
            cb(err)
        } else {
            log("start", new Date(end - 10 * timeBlock))
            log("end", new Date(end))
            log("timeBlock", timeBlock)
            log(">****", result)
            cb(result)
        }
    })
}

问题是我在图形和后端代码(getVolume函数)上无法获得相同的值

and the problem is that I can't get the same values on my graph and on the back-end code (getVolume function)

我意识到来自getHistory的日志不是我期望的(下面的日志):

I realised that the log from getHistory is not how I would expect it to be (log below):

start Fri Jul 18 2014 11:56:56 GMT+0100 (BST)
end Fri Jul 18 2014 11:58:36 GMT+0100 (BST)
timeBlock 10000
>**** [ { count: 4,
    timeParts: { year: 2014, month: 7, day: 18, hour: 10, minute: 58, second: 30 } },
  { count: 6,
    timeParts: { year: 2014, month: 7, day: 18, hour: 10, minute: 58, second: 20 } },
  { count: 3,
    timeParts: { year: 2014, month: 7, day: 18, hour: 10, minute: 58, second: 10 } },
  { count: 3,
    timeParts: { year: 2014, month: 7, day: 18, hour: 10, minute: 58, second: 0 } },
  { count: 2,
    timeParts: { year: 2014, month: 7, day: 18, hour: 10, minute: 57, second: 50 } } ]

所以我希望getHistory应该从start Fri Jul 18 2014 11:56:56 GMT+0100 (BST)开始每10秒在mongo中查找一次数据,所以它看起来大致像这样:

So I would expect that the getHistory should look up data in mongo every 10 seconds starting from start Fri Jul 18 2014 11:56:56 GMT+0100 (BST) so it will look roughly like:

11:56:56 count: 3
11:57:06 count: 0
11:57:16 count: 14
... etc.

待办事项: 1.我知道我应该在我的聚合函数中介绍当我猜这一次被跳过的时候计数为0的情况.

TODO: 1. I know I should cover in my aggregate function the case when the count is 0 at the moment I guess this time is skipped`

推荐答案

您的错误是如何计算$group运算符的_id,特别是其second部分:

Your error is how you're calculating _id for $group operator, specifically its second part:

second: { $subtract: [
    { $second: "$time" },
    { $mod: [
        { $second: "$time" },
        timeBlock / 1000
    ]}
]}

因此,您不是将所有数据从new Date(end - 10 * timeBlock)开始划分为10个timeBlock毫秒长的块,而是从最近的timeBlock除数开始将其划分为11个块.

So, instead of splitting all your data into 10 timeBlock milliseconds long chunks starting from new Date(end - 10 * timeBlock), you're splitting it into 11 chunks starting from from the nearest divisor of timeBlock.

要修复此问题,您应该首先计算delta = end - $time,然后使用它代替原始的$time来构建您的_id.

To fix it you should first calculate delta = end - $time and then use it instead of the original $time to build your _id.

以下是我的意思的示例:

Here is an example of what I mean:

Document.aggregate({
    $match: {
        time: {
            $gte: new Date(end - 10 * timeBlock),
            $lt: new Date(end)
        }
    }
}, {
    $project: {
        time: 1,
        delta: { $subtract: [
            new Date(end),
            "$time"
        ]}
    }
}, {
    $project: {
        time: 1,
        delta: { $subtract: [
            "$delta",
            { $mod: [
                "$delta",
                timeBlock
            ]}
        ]}
    }
}, {
    $group: {
        _id: { $subtract: [
            new Date(end),
            "$delta"
        ]},
        count: { $sum: 1 }
    }
}, {
    $project: {
        time: "$_id",
        count: 1,
        _id: 0
    }
}, {
    $sort: {
        time: 1
    }
}, function(err, result) {
    // ...
})

我还建议您使用原始时间值(以毫秒为单位),因为它更容易使用,并且可以防止您出错.您可以使用$project运算符在$group之后将time转换为timeParts.

I also recommend you to use raw time values (in milliseconds), because it's much easier and because it'll keep you from making a mistake. You could cast time into timeParts after $group using $project operator.

这篇关于时间序列和汇总框架(Mongo)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆