加速MongoDB聚合 [英] Speed up MongoDB aggregation
问题描述
我有一个带有以下结构的分片集合my_collection:
I have a sharded collection "my_collection" with the following structure:
{
"CREATED_DATE" : ISODate(...),
"MESSAGE" : "Test Message",
"LOG_TYPE": "EVENT"
}
mongoDB环境使用2个分片进行分片。使用LOG_TYPE上的Hashed分片键对上面的集合进行分片。 LOG_TYPE属性还有7种其他可能性。
The mongoDB environment is sharded with 2 shards. The above collection is sharded using Hashed shard key on LOG_TYPE. There are 7 more other possibilities for LOG_TYPE attribute.
我在my_collection中有100万个文档,我试图找到基于LOG_TYPE的文档数量以下查询:
I have 1 million documents in "my_collection" and I am trying to find the count of documents based on the LOG_TYPE using the following query:
db.my_collection.aggregate([
{ "$group" :{
"_id": "$LOG_TYPE",
"COUNT": { "$sum":1 }
}}
])
但这让我得到了大约3秒的结果。有没有办法改善这个?此外,当我运行explain命令时,它显示没有使用索引。 group命令是否不使用索引?
But this is getting me result in about 3 seconds. Is there any way to improve this? Also when I run the explain command, it shows that no Index has been used. Does the group command doesn't use an Index?
推荐答案
聚合框架可以做些什么来改善查询的性能,但您可以通过以下方式提供帮助:
There are currently some limitations in what aggregation framework can do to improve the performance of your query, but you can help it the following way:
db.my_collection.aggregate([
{ "$sort" : { "LOG_TYPE" : 1 } },
{ "$group" :{
"_id": "$LOG_TYPE",
"COUNT": { "$sum":1 }
}}
])
通过添加排序LOG_TYPE您将强制优化器使用LOG_TYPE上的索引来按顺序获取文档。这将以多种方式改善性能,但根据所使用的版本而有所不同。
By adding a sort on LOG_TYPE you will be "forcing" the optimizer to use an index on LOG_TYPE to get the documents in order. This will improve the performance in several ways, but differently depending on the version being used.
如果您对进入$ group阶段的数据进行排序,则在实际数据上将提高总计积累的效率。您可以看到不同的查询计划,其中$ sort将使用分片键索引。实际性能的改进将取决于每个桶中的值的数量 - 通常LOG_TYPE只有七个不同的值使得它成为极差的分片键,但它确实意味着下面的代码很可能是一个甚至比优化聚合快得多:
On real data if you have the data coming into the $group stage sorted, it will improve the efficiency of accumulation of the totals. You can see the different query plans where with $sort it will use the shard key index. The improvement this gives in actual performance will depend on the number of values in each "bucket" - in general LOG_TYPE having only seven distinct values makes it an extremely poor shard key, but it does mean that it all likelihood the following code will be a lot faster than even optimized aggregation:
db.my_collection.distinct("LOG_TYPE").forEach(function(lt) {
print(db.my_collection.count({"LOG_TYPE":lt});
});
这篇关于加速MongoDB聚合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!