MongoDB总时间序列 [英] MongoDB Aggregate Time Series

查看：77 发布时间：2020/5/11 1:36:44 mongodb mongodb-query aggregation-framework

本文介绍了MongoDB总时间序列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用MongoDB来存储时间序列数据，其结构与此处解释的面向文档的设计"类似:

I'm using MongoDB to store time series data using a similar structure to "The Document-Oriented Design" explained here: http://blog.mongodb.org/post/65517193370/schema-design-for-time-series-data-in-mongodb

目标是查询整个系统中一天中最繁忙的10分钟.每个文档使用60个子文档(每分钟1个)存储1个小时的数据.每分钟存储嵌入在"vals"字段中的各种指标.我关心的指标是订单".示例文档如下所示:

The objective is to query for the top 10 busiest minutes of the day on the whole system. Each document stores 1 hour of data using 60 sub-documents (1 for each minute). Each minute stores various metrics embedded in the "vals" field. The metric I care about is "orders". A sample document looks like this:

{
        "_id" : ObjectId("54d023802b1815b6ef7162a4"),
        "user" : "testUser",
        "hour" : ISODate("2015-01-09T13:00:00Z"),
        "vals" : {
                "0" : {
                        "orders" : 11,
                        "anotherMetric": 15
                },
                "1" : {
                        "orders" : 12,
                        "anotherMetric": 20
                },
                .
                .
                .
        }
}

请注意系统中有很多用户.

Note there are many users in the system.

我已经通过对以下组对象进行汇总来使结构(某种程度上)平坦化:

I've managed to flatten the structure (somewhat) by doing an aggregate with the following group object:

group = {
    $group: {
        _id: {
            hour: "$hour"
        },
        0: {$sum: "$vals.0.orders"},
        1: {$sum: "$vals.1.orders"},
        2: {$sum: "$vals.2.orders"},
        .
        .
        .
    }
}

但这只给了我24个文档(每小时1个)，该小时内每分钟的订单数是这样的，

But that just gives me 24 documents (1 for each hour) with the # of orders for each minute during that hour, like so:

{
    "_id" : {
            "hour" : ISODate("2015-01-20T14:00:00Z")
    },
    "0" : 282086,
    "1" : 239358,
    "2" : 289188,
    .
    .
    .
}

现在，我需要以某种方式获取一天中的前10分钟，但我不确定如何做到.我怀疑可以用$ project完成，但是我不确定如何.

Now I need to somehow get the top 10 minutes of the day from this but I'm not sure how. I suspect it can be done with $project, but I'm not sure how.

推荐答案

您可以将其汇总为:

$match特定日期的文件.
在查询之前构造$group和$project对象.

>中的

$group，每小时每小时累积所有文档数组中的分钟.将分钟保留在文档中的某个位置.

$project变量文档为每个文档中所有文档的$setUnion 小时.

$unwind文档.

$sort通过orders

$limit顶部的10文档是我们所需要的.

$match the documents for the specific date.
Construct the $group and $project objects before querying.
$group by the $hour, accumulate all the documents per hour per minute in an array.Keep the minute somewhere within the document.
$project a variable docs as $setUnion of all the documents per hour.
$unwind the documents.
$sort by orders
$limit the top 10 documents which is what we require.

代码:

var inputDate = new ISODate("2015-01-09T13:00:00Z");
var group = {};
var set = [];
for(var i=0;i<=60;i++){
    group[i] = {$push:{"doc":"$vals."+i,
                       "hour":"$_id.hour",
                       "min":{$literal:i}}};
    set.push("$"+i);
}
group["_id"] = {$hour:"$hour"};
var project = {"docs":{$setUnion:set}}

db.t.aggregate([
{$match:{"hour":{$lte:inputDate,$gte:inputDate}}},
{$group:group},
{$project:project},
{$unwind:"$docs"},
{$sort:{"docs.doc.orders":-1}},
{$limit:2},
{$project:{"_id":0,
           "hour":"$_id",
           "doc":"$docs.doc",
           "min":"$docs.min"}}
])

这篇关于MongoDB总时间序列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

MongoDB总时间序列 [英] MongoDB Aggregate Time Series

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

MongoDB总时间序列 [英] MongoDB Aggregate Time Series

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭