如何规范/减少mongoDB中的时间数据? [英] How to normalize/reduce time data in mongoDB?

查看:122
本文介绍了如何规范/减少mongoDB中的时间数据?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在MongoDB中存储了微小的性能数据,每个集合都是性能报告的一种类型,每个文档都是该时间点对阵列上端口的度量:

I'm storing minutely performance data in MongoDB, each collection is a type of performance report, and each document is the measurement at that point in time for the port on the array:

{
  "DateTime" : ISODate("2012-09-28T15:51:03.671Z"),
  "array_serial" : "12345",
  "Port Name" : "CL1-A",
  "metric" : 104.2
}

每个"array_serial"最多可以有128个不同的端口名"条目.

There can be up to 128 different "Port Name" entries per "array_serial".

随着数据的老化,我希望能够在越来越长的时间范围内求平均值:

As the data ages I'd like to be able to average it out over increasing time spans:

  • 长达1周:分钟
  • 1周至1个月:5分钟
  • 1-3个月:15分钟

等. 这是我平均时间的方式,以便可以减少时间:

etc.. Here's how I'm averaging the times so that they can be reduced :

var resolution = 5; // How many minutes to average over     
var map = function(){
        var coeff = 1000 * 60 * resolution;
        var roundTime = new Date(Math.round(this.DateTime.getTime() / coeff) * coeff);
        emit(roundTime, { value : this.metric, count: 1 } );
 };

我将对reduce函数中的值和计数求和,并在finalize函数中获得平均值.

I'll be summing the values and counts in the reduce function, and getting the average in the finalize funciton.

如您所见,这将在不计算端口名"值的时间内对数据进行平均,而我需要对每个"array_serial"上每个端口名"的值随时间进行平均.

As you can see this would average the data for just the time leaving out the "Port Name" value, and I need to average the values over time for each "Port Name" on each "array_serial".

那么如何在上述map函数中包含端口名称?发射的键应该是我稍后拆分的复合"array_serial,PortName,DateTime"值吗?还是应该使用查询功能查询每个不同的序列,端口和时间?我可以将这些数据正确存储在数据库中吗?

So how can I include the port name in the above map function? Should the key for the emit be a compound "array_serial,PortName,DateTime" value that I split later? Or should I use the query function to query for each distinct serial, port and time? Am I storing this data in the database correctly?

据我所知,这些数据已保存到自己的集合中,用这种平均数据替换集合中的数据的标准做法是什么?

Also, as far as I know this data gets saved out to it's own collection, what's the standard practice for replacing the data in the collection with this averaged data?

这是您所说的Asya吗?因为它没有将文档分组到较低的5分钟(顺便说一句,我将"DateTime"更改为"datetime"):

Is this what you mean Asya? Because it's not grouping the documents rounded to the lower 5 minute (btw, I changed 'DateTime' to 'datetime'):

    $project: {
                "year" : { $year : "$datetime" },
                "month" : { $month : "$datetime" },
                "day" : { $dayOfMonth : "$datetime" },
                "hour" : { $hour : "$datetime" },
                "minute" : { $mod : [ {$minute : "$datetime"}, 5] },
                array_serial: 1,
                port_name: 1,
                port_number: 2,
                metric: 1
}

据我所知,"$ mod"运算符将返回分钟的余数除以五,对吗?

From what I can tell the "$mod" operator will return the remainder of the minute divided by five, correct?

如果我可以让聚合框架执行此操作而不是使用mapreduce,这将对我有很大帮助.

This would really help me if I could get the aggregation framework to do this operation rather than mapreduce.

推荐答案

以下是在聚合框架中执行此操作的方法.我使用的是一个简单的简化方法-我仅按年,月和日期分组-在您的情况下,您需要添加小时和分钟以进行更细粒度的计算.如果您获得的数据样本中的点分布不均匀,您还可以选择是否进行加权平均.

Here is how you could do it in aggregation framework. I'm using a small simplification - I'm only grouping on Year, Month and Date - in your case you will need to add hour and minute for the finer grained calculations. You also have a choice about whether to do weighted average if the point distribution is not uniform in the data sample you get.

project={"$project" : {
        "year" : {
            "$year" : "$DateTime"
        },
        "month" : {
            "$month" : "$DateTime"
        },
        "day" : {
            "$dayOfWeek" : "$DateTime"
        },
        "array_serial" : 1,
        "Port Name" : 1,
        "metric" : 1
    }
};
group={"$group" : {
        "_id" : {
            "a" : "$array_serial",
            "P" : "$Port Name",
            "y" : "$year",
            "m" : "$month",
                    "d" : "$day"
        },
        "avgMetric" : {
            "$avg" : "$metric"
        }
    }
};

db.metrics.aggregate([project, group]).result

我用一些随机样本数据运行了此文件,并得到了以下格式的文件:

I ran this with some random sample data and got something of this format:

[
    {
        "_id" : {
            "a" : "12345",
            "P" : "CL1-B",
            "y" : 2012,
            "m" : 9,
            "d" : 6
        },
        "avgMetric" : 100.8
    },
    {
        "_id" : {
            "a" : "12345",
            "P" : "CL1-B",
            "y" : 2012,
            "m" : 9,
            "d" : 7
        },
        "avgMetric" : 98
    },
    {
        "_id" : {
            "a" : "12345",
            "P" : "CL1-A",
            "y" : 2012,
            "m" : 9,
            "d" : 6
        },
        "avgMetric" : 105
    }
]

如您所见,这是每个array_serial,端口名,年/月/日组合的一个结果.您可以使用$ sort使它们进入要从那里处理它们的顺序.

As you can see this is one result per array_serial, port name, year/month/date combination. You can use $sort to get them into the order you want to process them from there.

这是将项目步骤扩展到包括小时和分钟,同时将分钟四舍五入为每五分钟的平均值的方法:

Here is how you would extend the project step to include hour and minute while rounding minutes to average over every five minutes:

{
    "$project" : {
        "year" : {
            "$year" : "$DateTime"
        },
        "month" : {
            "$month" : "$DateTime"
        },
        "day" : {
            "$dayOfWeek" : "$DateTime"
        },
        "hour" : {
            "$hour" : "$DateTime"
        },
        "fmin" : {
            "$subtract" : [
                {
                    "$minute" : "$DateTime"
                },
                {
                    "$mod" : [
                        {
                            "$minute" : "$DateTime"
                        },
                        5
                    ]
                }
            ]
        },
        "array_serial" : 1,
        "Port Name" : 1,
        "metric" : 1
    }
}

希望您可以将其扩展到您的特定数据和要求.

Hope you will be able to extend that to your specific data and requirements.

这篇关于如何规范/减少mongoDB中的时间数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆