MongoDB聚合:根据前几行的总和计算运行总计 [英] MongoDB Aggregation: Compute Running Totals from sum of previous rows
问题描述
样本文件:
{
_id: ObjectId('4f442120eb03305789000000'),
time: ISODate("2013-10-10T20:55:36Z"),
value:1
},
{
_id: ObjectId('4f442120eb03305789000001'),
time: ISODate("2013-10-10T28:43:16Z"),
value:2
},
{
_id: ObjectId('4f442120eb03305789000002'),
time: ISODate("2013-10-11T27:12:66Z"),
value:3
},
{
_id: ObjectId('4f442120eb03305789000003'),
time: ISODate("2013-10-11T10:15:38Z"),
value:4
},
{
_id: ObjectId('4f442120eb03305789000004'),
time: ISODate("2013-10-12T26:15:38Z"),
value:5
}
很容易获得按日期分组的汇总结果. 但是我想要的是查询返回运行总计的结果 的聚合,例如:
{
time: "2013-10-10"
total: 3,
runningTotal: 3
},
{
time: "2013-10-11"
total: 7,
runningTotal: 10
},
{
time: "2013-10-12"
total: 5,
runningTotal: 15
}
MongoDB聚合有可能吗?
这可以满足您的需求.我已经对数据中的时间进行了标准化,因此它们可以组合在一起(您可以执行类似time相对应的total
的数组索引.在$unwind
之前$sort
很重要,因为这可以确保数组的顺序正确.
db.temp.aggregate(
[
{
'$group': {
'_id': '$time',
'total': { '$sum': '$value' }
}
},
{
'$sort': {
'_id': 1
}
},
{
'$group': {
'_id': 0,
'time': { '$push': '$_id' },
'totals': { '$push': '$total' }
}
},
{
'$unwind': {
'path' : '$time',
'includeArrayIndex' : 'index'
}
},
{
'$project': {
'_id': 0,
'time': { '$dateToString': { 'format': '%Y-%m-%d', 'date': '$time' } },
'total': { '$arrayElemAt': [ '$totals', '$index' ] },
'runningTotal': { '$sum': { '$slice': [ '$totals', { '$add': [ '$index', 1 ] } ] } },
}
},
]
);
我在约有80 000个文档的集合中使用了类似的方法,总计有63个结果.我不确定它将在较大的集合上运行得如何,但是我发现,一旦将数据减少到可管理的大小,对聚合数据执行转换(投影,数组操作)似乎并不会带来很大的性能成本.>
Sample Documents:
{
_id: ObjectId('4f442120eb03305789000000'),
time: ISODate("2013-10-10T20:55:36Z"),
value:1
},
{
_id: ObjectId('4f442120eb03305789000001'),
time: ISODate("2013-10-10T28:43:16Z"),
value:2
},
{
_id: ObjectId('4f442120eb03305789000002'),
time: ISODate("2013-10-11T27:12:66Z"),
value:3
},
{
_id: ObjectId('4f442120eb03305789000003'),
time: ISODate("2013-10-11T10:15:38Z"),
value:4
},
{
_id: ObjectId('4f442120eb03305789000004'),
time: ISODate("2013-10-12T26:15:38Z"),
value:5
}
It's easy to get the aggregated results that is grouped by date. But what I want is to query results that returns a running total of the aggregation, like:
{
time: "2013-10-10"
total: 3,
runningTotal: 3
},
{
time: "2013-10-11"
total: 7,
runningTotal: 10
},
{
time: "2013-10-12"
total: 5,
runningTotal: 15
}
Is this possible with the MongoDB Aggregation?
This does what you need. I have normalised the times in the data so they group together (You could do something like this). The idea is to $group
and push the time
's and total
's into separate arrays. Then $unwind
the time
array, and you have made a copy of the totals
array for each time
document. You can then calculated the runningTotal
(or something like the rolling average) from the array containing all the data for different times. The 'index' generated by $unwind
is the array index for the total
corresponding to that time
. It is important to $sort
before $unwind
ing since this ensures the arrays are in the correct order.
db.temp.aggregate(
[
{
'$group': {
'_id': '$time',
'total': { '$sum': '$value' }
}
},
{
'$sort': {
'_id': 1
}
},
{
'$group': {
'_id': 0,
'time': { '$push': '$_id' },
'totals': { '$push': '$total' }
}
},
{
'$unwind': {
'path' : '$time',
'includeArrayIndex' : 'index'
}
},
{
'$project': {
'_id': 0,
'time': { '$dateToString': { 'format': '%Y-%m-%d', 'date': '$time' } },
'total': { '$arrayElemAt': [ '$totals', '$index' ] },
'runningTotal': { '$sum': { '$slice': [ '$totals', { '$add': [ '$index', 1 ] } ] } },
}
},
]
);
I have used something similar on a collection with ~80 000 documents, aggregating to 63 results. I am not sure how well it will work on larger collections, but I have found that performing transformations(projections, array manipulations) on aggregated data does not seem to have a large performance cost once the data is reduced to a manageable size.
这篇关于MongoDB聚合:根据前几行的总和计算运行总计的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!