加速MongoDB聚合 [英] Speed up MongoDB aggregation

查看：146 发布时间：2018/8/2 13:27:59 mongodb indexing aggregation-framework

本文介绍了加速MongoDB聚合的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个带有以下结构的分片集合my_collection：

I have a sharded collection "my_collection" with the following structure:

{ 
   "CREATED_DATE" : ISODate(...),
   "MESSAGE" : "Test Message",
   "LOG_TYPE": "EVENT"
}

mongoDB环境使用2个分片进行分片。使用LOG_TYPE上的Hashed分片键对上面的集合进行分片。 LOG_TYPE属性还有7种其他可能性。

The mongoDB environment is sharded with 2 shards. The above collection is sharded using Hashed shard key on LOG_TYPE. There are 7 more other possibilities for LOG_TYPE attribute.

我在my_collection中有100万个文档，我试图找到基于LOG_TYPE的文档数量以下查询：

I have 1 million documents in "my_collection" and I am trying to find the count of documents based on the LOG_TYPE using the following query:

db.my_collection.aggregate([
    { "$group" :{ 
        "_id": "$LOG_TYPE",
        "COUNT": { "$sum":1 }
    }}
])

但这让我得到了大约3秒的结果。有没有办法改善这个？此外，当我运行explain命令时，它显示没有使用索引。 group命令是否不使用索引？

But this is getting me result in about 3 seconds. Is there any way to improve this? Also when I run the explain command, it shows that no Index has been used. Does the group command doesn't use an Index?

推荐答案

聚合框架可以做些什么来改善查询的性能，但您可以通过以下方式提供帮助：

There are currently some limitations in what aggregation framework can do to improve the performance of your query, but you can help it the following way:

db.my_collection.aggregate([
    { "$sort" : { "LOG_TYPE" : 1 } },
    { "$group" :{ 
        "_id": "$LOG_TYPE",
        "COUNT": { "$sum":1 }
    }}
])

通过添加排序LOG_TYPE您将强制优化器使用LOG_TYPE上的索引来按顺序获取文档。这将以多种方式改善性能，但根据所使用的版本而有所不同。

By adding a sort on LOG_TYPE you will be "forcing" the optimizer to use an index on LOG_TYPE to get the documents in order. This will improve the performance in several ways, but differently depending on the version being used.

如果您对进入$ group阶段的数据进行排序，则在实际数据上将提高总计积累的效率。您可以看到不同的查询计划，其中$ sort将使用分片键索引。实际性能的改进将取决于每个桶中的值的数量 - 通常LOG_TYPE只有七个不同的值使得它成为极差的分片键，但它确实意味着下面的代码很可能是一个甚至比优化聚合快得多：

On real data if you have the data coming into the $group stage sorted, it will improve the efficiency of accumulation of the totals. You can see the different query plans where with $sort it will use the shard key index. The improvement this gives in actual performance will depend on the number of values in each "bucket" - in general LOG_TYPE having only seven distinct values makes it an extremely poor shard key, but it does mean that it all likelihood the following code will be a lot faster than even optimized aggregation:

db.my_collection.distinct("LOG_TYPE").forEach(function(lt) {
   print(db.my_collection.count({"LOG_TYPE":lt});
});

这篇关于加速MongoDB聚合的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

加速MongoDB聚合 [英] Speed up MongoDB aggregation

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

加速MongoDB聚合 [英] Speed up MongoDB aggregation

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭