加速 MongoDB 聚合 [英] Speed up MongoDB aggregation
问题描述
我有一个分片集合my_collection",其结构如下:
I have a sharded collection "my_collection" with the following structure:
{
"CREATED_DATE" : ISODate(...),
"MESSAGE" : "Test Message",
"LOG_TYPE": "EVENT"
}
mongoDB 环境由 2 个分片分片.上面的集合使用 LOG_TYPE 上的散列分片键进行分片.LOG_TYPE 属性还有 7 种其他可能性.
The mongoDB environment is sharded with 2 shards. The above collection is sharded using Hashed shard key on LOG_TYPE. There are 7 more other possibilities for LOG_TYPE attribute.
我在my_collection"中有 100 万个文档,我正在尝试使用以下查询根据 LOG_TYPE 查找文档数:
I have 1 million documents in "my_collection" and I am trying to find the count of documents based on the LOG_TYPE using the following query:
db.my_collection.aggregate([
{ "$group" :{
"_id": "$LOG_TYPE",
"COUNT": { "$sum":1 }
}}
])
但这让我在大约 3 秒内得到结果.有什么办法可以改善这一点吗?此外,当我运行解释命令时,它显示没有使用任何索引.组命令不使用索引吗?
But this is getting me result in about 3 seconds. Is there any way to improve this? Also when I run the explain command, it shows that no Index has been used. Does the group command doesn't use an Index?
推荐答案
目前聚合框架在提高查询性能方面存在一些限制,但您可以通过以下方式帮助它:
There are currently some limitations in what aggregation framework can do to improve the performance of your query, but you can help it the following way:
db.my_collection.aggregate([
{ "$sort" : { "LOG_TYPE" : 1 } },
{ "$group" :{
"_id": "$LOG_TYPE",
"COUNT": { "$sum":1 }
}}
])
通过在 LOG_TYPE 上添加排序,您将强制"优化器使用 LOG_TYPE 上的索引来按顺序获取文档.这将通过多种方式提高性能,但具体取决于所使用的版本.
By adding a sort on LOG_TYPE you will be "forcing" the optimizer to use an index on LOG_TYPE to get the documents in order. This will improve the performance in several ways, but differently depending on the version being used.
在真实数据上,如果对进入$group阶段的数据进行排序,将会提高总数的累积效率.您可以看到不同的查询计划,其中 $sort 将使用分片键索引.这对实际性能的改进将取决于每个桶"中值的数量 - 通常只有七个不同值的 LOG_TYPE 使它成为一个极差的分片键,但这确实意味着以下代码很可能是甚至比优化聚合快得多:
On real data if you have the data coming into the $group stage sorted, it will improve the efficiency of accumulation of the totals. You can see the different query plans where with $sort it will use the shard key index. The improvement this gives in actual performance will depend on the number of values in each "bucket" - in general LOG_TYPE having only seven distinct values makes it an extremely poor shard key, but it does mean that it all likelihood the following code will be a lot faster than even optimized aggregation:
db.my_collection.distinct("LOG_TYPE").forEach(function(lt) {
print(db.my_collection.count({"LOG_TYPE":lt});
});
这篇关于加速 MongoDB 聚合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!