加速MongoDB聚合 [英] Speed up MongoDB aggregation

查看:146
本文介绍了加速MongoDB聚合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个带有以下结构的分片集合my_collection:

I have a sharded collection "my_collection" with the following structure:

{ 
   "CREATED_DATE" : ISODate(...),
   "MESSAGE" : "Test Message",
   "LOG_TYPE": "EVENT"
}

mongoDB环境使用2个分片进行分片。使用LOG_TYPE上的Hashed分片键对上面的集合进行分片。 LOG_TYPE属性还有7种其他可能性。

The mongoDB environment is sharded with 2 shards. The above collection is sharded using Hashed shard key on LOG_TYPE. There are 7 more other possibilities for LOG_TYPE attribute.

我在my_collection中有100万个文档,我试图找到基于LOG_TYPE的文档数量以下查询:

I have 1 million documents in "my_collection" and I am trying to find the count of documents based on the LOG_TYPE using the following query:

db.my_collection.aggregate([
    { "$group" :{ 
        "_id": "$LOG_TYPE",
        "COUNT": { "$sum":1 }
    }}
])

但这让我得到了大约3秒的结果。有没有办法改善这个?此外,当我运行explain命令时,它显示没有使用索引。 group命令是否不使用索引?

But this is getting me result in about 3 seconds. Is there any way to improve this? Also when I run the explain command, it shows that no Index has been used. Does the group command doesn't use an Index?

推荐答案

聚合框架可以做些什么来改善查询的性能,但您可以通过以下方式提供帮助:

There are currently some limitations in what aggregation framework can do to improve the performance of your query, but you can help it the following way:

db.my_collection.aggregate([
    { "$sort" : { "LOG_TYPE" : 1 } },
    { "$group" :{ 
        "_id": "$LOG_TYPE",
        "COUNT": { "$sum":1 }
    }}
])

通过添加排序LOG_TYPE您将强制优化器使用LOG_TYPE上的索引来按顺序获取文档。这将以多种方式改善性能,但根据所使用的版本而有所不同。

By adding a sort on LOG_TYPE you will be "forcing" the optimizer to use an index on LOG_TYPE to get the documents in order. This will improve the performance in several ways, but differently depending on the version being used.

如果您对进入$ group阶段的数据进行排序,则在实际数据上将提高总计积累的效率。您可以看到不同的查询计划,其中$ sort将使用分片键索引。实际性能的改进将取决于每个桶中的值的数量 - 通常LOG_TYPE只有七个不同的值使得它成为极差的分片键,但它确实意味着下面的代码很可能是一个甚至比优化聚合快得多:

On real data if you have the data coming into the $group stage sorted, it will improve the efficiency of accumulation of the totals. You can see the different query plans where with $sort it will use the shard key index. The improvement this gives in actual performance will depend on the number of values in each "bucket" - in general LOG_TYPE having only seven distinct values makes it an extremely poor shard key, but it does mean that it all likelihood the following code will be a lot faster than even optimized aggregation:

db.my_collection.distinct("LOG_TYPE").forEach(function(lt) {
   print(db.my_collection.count({"LOG_TYPE":lt});
});

这篇关于加速MongoDB聚合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆