汇总和更新MongoDB [英] Aggregate and update MongoDB
问题描述
我有2个收藏集:
- 客户(600万份文档)
- 订单(5亿份文档)
每天一次,我想按客户计算过去一年,过去一个月和过去一周的订单数量.
Once a day, i would like to calculate the number of orders in the past year, past month and past week, and such, by client.
我尝试过:
db.orders.aggregate(
{$match:
{ date_order: { $gt: v_date1year } }
},
{$group : {
_id : "$id_client",
count : {$sum : 1}
}} ,
{
"$out": "tmp_indicators"
}
)
db.tmp_indicators.find({}).forEach(function (my_client) {
db.clients.update (
{"id_client": my_client._id},
{"$set":
{ "nb_orders_1year" : my_client.count }
}
)
})
我必须执行3次,过去一年总计1次,过去一个月1次,过去一周1次. 治疗非常缓慢,您是否知道如何以更好的方式执行治疗?
I have to do this 3 times, 1 for the past year aggregation, 1 for the past month and 1 for the past week. The treatement is very slow, do you have an idea of how to perform it in a better way?
推荐答案
为提高性能(尤其是处理大型馆藏时),请充分利用 forEach()
循环),但是每1000个请求中只有一次,因此,您的更新比目前是.
For improved performance especially when dealing with large collections, take advantage of using the Bulk()
API for bulk updates as you will be sending the operations to the server in batches (for example, say a batch size of 1000) which gives you much better performance since you won't be sending every request to the server (as you are currently doing with the update statement within the forEach()
loop) but just once in every 1000 requests, thus making your updates more efficient and quicker than currently is.
以下示例演示了这种方法,第一个使用 Bulk()
API在MongoDB版本>= 2.6 and < 3.2
中可用.它通过使用聚合结果中的值更改nb_orders_1year
字段来更新clients
集合中的所有文档.
The following examples demonstrate this approach, the first one uses the Bulk()
API available in MongoDB versions >= 2.6 and < 3.2
. It updates all the documents in the clients
collection by changing the nb_orders_1year
fields with values from the aggregation results.
由于 "> strong> ,您可以使用聚合输出集合的 cursor
,
Since the You can use the aggregation output collection's aggregate()
method returns a cursor
,forEach()
method to iterate it and access each document thus setting up the bulk update operations in batches to then send across the server efficiently with the API:
var bulk = db.clients.initializeUnorderedBulkOp(),
pipeline = [
{
"$match": { "date_order": { "$gt": v_date1year } }
},
{
"$group": {
"_id": "$id_client",
"count": { "$sum" : 1 }
}
},
{ "$out": "tmp_indicators" }
],
counter = 0;
db.orders.aggregate(pipeline);
db.tmp_indicators.find().forEach(function (doc) {
bulk.find({ "_id": doc._id }).updateOne({
"$set": { "nb_orders_1year": doc.count }
});
counter++;
if (counter % 1000 == 0) {
bulk.execute(); // Execute per 1000 operations and re-initialize every 1000 update statements
bulk = db.clients.initializeUnorderedBulkOp();
}
});
// Clean up remaining operations in queue
if (counter % 1000 != 0) { bulk.execute(); }
下一个示例适用于新的MongoDB版本3.2
,此版本自 bulkWrite()
.
The next example applies to the new MongoDB version 3.2
which has since deprecated the Bulk API and provided a newer set of apis using bulkWrite()
.
It uses the same cursor as above but instead of iterating the result, create the array with the bulk operations by using its map()
method:
var pipeline = [
{
"$match": { "date_order": { "$gt": v_date1year } }
},
{
"$group": {
"_id": "$id_client",
"count": { "$sum" : 1 }
}
},
{ "$out": "tmp_indicators" }
];
db.orders.aggregate(pipeline);
var bulkOps = db.tmp_indicators.find().map(function (doc) {
return {
"updateOne": {
"filter": { "_id": doc._id } ,
"update": { "$set": { "nb_orders_1year": doc.count } }
}
};
});
db.clients.bulkWrite(bulkOps, { "ordered": true });
这篇关于汇总和更新MongoDB的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!