MongoDB聚合:根据前几行的总和计算运行总计 [英] MongoDB Aggregation: Compute Running Totals from sum of previous rows

查看:54
本文介绍了MongoDB聚合:根据前几行的总和计算运行总计的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

样本文件:

 {
 _id: ObjectId('4f442120eb03305789000000'),
 time: ISODate("2013-10-10T20:55:36Z"),
 value:1
},
{
 _id: ObjectId('4f442120eb03305789000001'),
 time: ISODate("2013-10-10T28:43:16Z"),
 value:2
},
{
 _id: ObjectId('4f442120eb03305789000002'),
 time: ISODate("2013-10-11T27:12:66Z"),
 value:3
},
{
 _id: ObjectId('4f442120eb03305789000003'),
 time: ISODate("2013-10-11T10:15:38Z"),
 value:4
},
{
 _id: ObjectId('4f442120eb03305789000004'),
 time: ISODate("2013-10-12T26:15:38Z"),
 value:5
}
 

很容易获得按日期分组的汇总结果. 但是我想要的是查询返回运行总计的结果 的聚合,例如:

 {
 time: "2013-10-10"
 total: 3,
 runningTotal: 3
},
{
 time: "2013-10-11"
 total: 7,
 runningTotal: 10 
},
{
 time: "2013-10-12"
 total: 5,
 runningTotal: 15
}
 

MongoDB聚合有可能吗?

解决方案

这可以满足您的需求.我已经对数据中的时间进行了标准化,因此它们可以组合在一起(您可以执行类似time相对应的total的数组索引.在$unwind之前$sort很重要,因为这可以确保数组的顺序正确.

db.temp.aggregate(
    [
        {
            '$group': {
                '_id': '$time',
                'total': { '$sum': '$value' }
            }
        },
        {
            '$sort': {
                 '_id': 1
            }
        },
        {
            '$group': {
                '_id': 0,
                'time': { '$push': '$_id' },
                'totals': { '$push': '$total' }
            }
        },
        {
            '$unwind': {
                'path' : '$time',
                'includeArrayIndex' : 'index'
            }
        },
        {
            '$project': {
                '_id': 0,
                'time': { '$dateToString': { 'format': '%Y-%m-%d', 'date': '$time' }  },
                'total': { '$arrayElemAt': [ '$totals', '$index' ] },
                'runningTotal': { '$sum': { '$slice': [ '$totals', { '$add': [ '$index', 1 ] } ] } },
            }
        },
    ]
);

我在约有80 000个文档的集合中使用了类似的方法,总计有63个结果.我不确定它将在较大的集合上运行得如何,但是我发现,一旦将数据减少到可管理的大小,对聚合数据执行转换(投影,数组操作)似乎并不会带来很大的性能成本.

Sample Documents:

{
 _id: ObjectId('4f442120eb03305789000000'),
 time: ISODate("2013-10-10T20:55:36Z"),
 value:1
},
{
 _id: ObjectId('4f442120eb03305789000001'),
 time: ISODate("2013-10-10T28:43:16Z"),
 value:2
},
{
 _id: ObjectId('4f442120eb03305789000002'),
 time: ISODate("2013-10-11T27:12:66Z"),
 value:3
},
{
 _id: ObjectId('4f442120eb03305789000003'),
 time: ISODate("2013-10-11T10:15:38Z"),
 value:4
},
{
 _id: ObjectId('4f442120eb03305789000004'),
 time: ISODate("2013-10-12T26:15:38Z"),
 value:5
}

It's easy to get the aggregated results that is grouped by date. But what I want is to query results that returns a running total of the aggregation, like:

{
 time: "2013-10-10"
 total: 3,
 runningTotal: 3
},
{
 time: "2013-10-11"
 total: 7,
 runningTotal: 10 
},
{
 time: "2013-10-12"
 total: 5,
 runningTotal: 15
}

Is this possible with the MongoDB Aggregation?

This does what you need. I have normalised the times in the data so they group together (You could do something like this). The idea is to $group and push the time's and total's into separate arrays. Then $unwind the time array, and you have made a copy of the totals array for each time document. You can then calculated the runningTotal (or something like the rolling average) from the array containing all the data for different times. The 'index' generated by $unwind is the array index for the total corresponding to that time. It is important to $sort before $unwinding since this ensures the arrays are in the correct order.

db.temp.aggregate(
    [
        {
            '$group': {
                '_id': '$time',
                'total': { '$sum': '$value' }
            }
        },
        {
            '$sort': {
                 '_id': 1
            }
        },
        {
            '$group': {
                '_id': 0,
                'time': { '$push': '$_id' },
                'totals': { '$push': '$total' }
            }
        },
        {
            '$unwind': {
                'path' : '$time',
                'includeArrayIndex' : 'index'
            }
        },
        {
            '$project': {
                '_id': 0,
                'time': { '$dateToString': { 'format': '%Y-%m-%d', 'date': '$time' }  },
                'total': { '$arrayElemAt': [ '$totals', '$index' ] },
                'runningTotal': { '$sum': { '$slice': [ '$totals', { '$add': [ '$index', 1 ] } ] } },
            }
        },
    ]
);

I have used something similar on a collection with ~80 000 documents, aggregating to 63 results. I am not sure how well it will work on larger collections, but I have found that performing transformations(projections, array manipulations) on aggregated data does not seem to have a large performance cost once the data is reduced to a manageable size.

这篇关于MongoDB聚合:根据前几行的总和计算运行总计的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆