加速 MongoDB 聚合 [英] Speed up MongoDB aggregation

查看:29
本文介绍了加速 MongoDB 聚合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个分片集合my_collection",其结构如下:

I have a sharded collection "my_collection" with the following structure:

{ 
   "CREATED_DATE" : ISODate(...),
   "MESSAGE" : "Test Message",
   "LOG_TYPE": "EVENT"
}

mongoDB 环境由 2 个分片分片.上面的集合使用 LOG_TYPE 上的散列分片键进行分片.LOG_TYPE 属性还有 7 种其他可能性.

The mongoDB environment is sharded with 2 shards. The above collection is sharded using Hashed shard key on LOG_TYPE. There are 7 more other possibilities for LOG_TYPE attribute.

我在my_collection"中有 100 万个文档,我正在尝试使用以下查询根据 LOG_TYPE 查找文档数:

I have 1 million documents in "my_collection" and I am trying to find the count of documents based on the LOG_TYPE using the following query:

db.my_collection.aggregate([
    { "$group" :{ 
        "_id": "$LOG_TYPE",
        "COUNT": { "$sum":1 }
    }}
])

但这让我在大约 3 秒内得到结果.有什么办法可以改善这一点吗?此外,当我运行解释命令时,它显示没有使用任何索引.组命令不使用索引吗?

But this is getting me result in about 3 seconds. Is there any way to improve this? Also when I run the explain command, it shows that no Index has been used. Does the group command doesn't use an Index?

推荐答案

目前聚合框架在提高查询性能方面存在一些限制,但您可以通过以下方式帮助它:

There are currently some limitations in what aggregation framework can do to improve the performance of your query, but you can help it the following way:

db.my_collection.aggregate([
    { "$sort" : { "LOG_TYPE" : 1 } },
    { "$group" :{ 
        "_id": "$LOG_TYPE",
        "COUNT": { "$sum":1 }
    }}
])

通过在 LOG_TYPE 上添加排序,您将强制"优化器使用 LOG_TYPE 上的索引来按顺序获取文档.这将通过多种方式提高性能,但具体取决于所使用的版本.

By adding a sort on LOG_TYPE you will be "forcing" the optimizer to use an index on LOG_TYPE to get the documents in order. This will improve the performance in several ways, but differently depending on the version being used.

在真实数据上,如果对进入$group阶段的数据进行排序,将会提高总数的累积效率.您可以看到不同的查询计划,其中 $sort 将使用分片键索引.这对实际性能的改进将取决于每个桶"中值的数量 - 通常只有七个不同值的 LOG_TYPE 使它成为一个极差的分片键,但这确实意味着以下代码很可能是甚至比优化聚合快得多:

On real data if you have the data coming into the $group stage sorted, it will improve the efficiency of accumulation of the totals. You can see the different query plans where with $sort it will use the shard key index. The improvement this gives in actual performance will depend on the number of values in each "bucket" - in general LOG_TYPE having only seven distinct values makes it an extremely poor shard key, but it does mean that it all likelihood the following code will be a lot faster than even optimized aggregation:

db.my_collection.distinct("LOG_TYPE").forEach(function(lt) {
   print(db.my_collection.count({"LOG_TYPE":lt});
});

这篇关于加速 MongoDB 聚合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆