时间序列和汇总框架(Mongo) [英] time series and aggregation framework (mongo)
问题描述
我正在尝试同步我在应用程序中运行的两个功能. 第一个实时检查每个时间段(例如,每10秒)我保存到MongoDB中的文档数:
I'm trying to synchronise two functions I run in my app. First one checks the count of the documents I save to MongoDB every time block (e.g. every 10 seconds) in the real time:
var getVolume = function(timeBlock, cb) {
var triggerTime = Date.now();
var blockPeriod = triggerTime - timeBlock;
Document.find({
time: { $gt: blockPeriod }
}).count(function(err, count) {
log('getting volume since ', new Date(blockPeriod), 'result is', count)
cb(triggerTime, count);
});
};
然后有第二个函数,只要我想获取图形数据(前端),就可以使用该函数:
and then I have the second function which I use whenever I want to get a data for my graph (front end):
var getHistory = function(timeBlock, end, cb) {
Document.aggregate(
{
$match: {
time: {
$gte: new Date(end - 10 * timeBlock),
$lt: new Date(end)
}
}
},
// count number of documents based on time block
// timeBlock is divided by 1000 as we use it as seconds here
// and the timeBlock parameter is in miliseconds
{
$group: {
_id: {
year: { $year: "$time" },
month: { $month: "$time" },
day: { $dayOfMonth: "$time" },
hour: { $hour: "$time" },
minute: { $minute: "$time" },
second: { $subtract: [
{ $second: "$time" },
{ $mod: [
{ $second: "$time" },
timeBlock / 1000
]}
]}
},
count: { $sum: 1 }
}
},
// changing the name _id to timeParts
{
$project: {
timeParts: "$_id",
count: 1,
_id: 0
}
},
// sorting by date, from earliest to latest
{
$sort: {
"time": 1
}
}, function(err, result) {
if (err) {
cb(err)
} else {
log("start", new Date(end - 10 * timeBlock))
log("end", new Date(end))
log("timeBlock", timeBlock)
log(">****", result)
cb(result)
}
})
}
问题是我在图形和后端代码(getVolume
函数)上无法获得相同的值
and the problem is that I can't get the same values on my graph and on the back-end code (getVolume
function)
我意识到来自getHistory
的日志不是我期望的(下面的日志):
I realised that the log from getHistory
is not how I would expect it to be (log below):
start Fri Jul 18 2014 11:56:56 GMT+0100 (BST)
end Fri Jul 18 2014 11:58:36 GMT+0100 (BST)
timeBlock 10000
>**** [ { count: 4,
timeParts: { year: 2014, month: 7, day: 18, hour: 10, minute: 58, second: 30 } },
{ count: 6,
timeParts: { year: 2014, month: 7, day: 18, hour: 10, minute: 58, second: 20 } },
{ count: 3,
timeParts: { year: 2014, month: 7, day: 18, hour: 10, minute: 58, second: 10 } },
{ count: 3,
timeParts: { year: 2014, month: 7, day: 18, hour: 10, minute: 58, second: 0 } },
{ count: 2,
timeParts: { year: 2014, month: 7, day: 18, hour: 10, minute: 57, second: 50 } } ]
所以我希望getHistory
应该从start Fri Jul 18 2014 11:56:56 GMT+0100 (BST)
开始每10秒在mongo中查找一次数据,所以它看起来大致像这样:
So I would expect that the getHistory
should look up data in mongo every 10 seconds starting from start Fri Jul 18 2014 11:56:56 GMT+0100 (BST)
so it will look roughly like:
11:56:56 count: 3
11:57:06 count: 0
11:57:16 count: 14
... etc.
待办事项:
1.我知道我应该在我的聚合函数中介绍当我猜这一次被跳过的时候计数为0
的情况.
TODO:
1. I know I should cover in my aggregate function the case when the count is 0
at the moment I guess this time is skipped`
推荐答案
您的错误是如何计算$group
运算符的_id
,特别是其second
部分:
Your error is how you're calculating _id
for $group
operator, specifically its second
part:
second: { $subtract: [
{ $second: "$time" },
{ $mod: [
{ $second: "$time" },
timeBlock / 1000
]}
]}
因此,您不是将所有数据从new Date(end - 10 * timeBlock)
开始划分为10个timeBlock
毫秒长的块,而是从最近的timeBlock
除数开始将其划分为11个块.
So, instead of splitting all your data into 10 timeBlock
milliseconds long chunks starting from new Date(end - 10 * timeBlock)
, you're splitting it into 11 chunks starting from from the nearest divisor of timeBlock
.
要修复此问题,您应该首先计算delta = end - $time
,然后使用它代替原始的$time
来构建您的_id
.
To fix it you should first calculate delta = end - $time
and then use it instead of the original $time
to build your _id
.
以下是我的意思的示例:
Here is an example of what I mean:
Document.aggregate({
$match: {
time: {
$gte: new Date(end - 10 * timeBlock),
$lt: new Date(end)
}
}
}, {
$project: {
time: 1,
delta: { $subtract: [
new Date(end),
"$time"
]}
}
}, {
$project: {
time: 1,
delta: { $subtract: [
"$delta",
{ $mod: [
"$delta",
timeBlock
]}
]}
}
}, {
$group: {
_id: { $subtract: [
new Date(end),
"$delta"
]},
count: { $sum: 1 }
}
}, {
$project: {
time: "$_id",
count: 1,
_id: 0
}
}, {
$sort: {
time: 1
}
}, function(err, result) {
// ...
})
我还建议您使用原始时间值(以毫秒为单位),因为它更容易使用,并且可以防止您出错.您可以使用$project
运算符在$group
之后将time
转换为timeParts
.
I also recommend you to use raw time values (in milliseconds), because it's much easier and because it'll keep you from making a mistake. You could cast time
into timeParts
after $group
using $project
operator.
这篇关于时间序列和汇总框架(Mongo)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!