MongoDB聚合性能功能 [英] MongoDB aggregation performance capability
问题描述
我正在尝试针对将MongoDb用于大量要用于各种聚合的大量文档的性能问题进行研究.
I am trying to work through some performance considerations about using MongoDb for a considerable amount of documents to be used in a variety of aggregations.
我已经了解到,集合的容量为32TB,这取决于块和分片键值的大小.
I have read that a collection has 32TB capcity depending on the sizes of chunk and shard key values.
如果我有65,000个客户,每个客户每天平均向我们提供350个销售交易,那么最终每天将创建约22,750,000个文档.当我说一个销售交易时,我的意思是一个对象,它类似于带有标题和行项目的发票.我拥有的每个文档平均为2.60kb.
If I have 65,000 customers who each supply to us (on average) 350 sales transactions per day, that ends up being about 22,750,000 documents getting created daily. When I say a sales transaction, I mean an object which is like an invoice with a header and line items. Each document I have is an average of 2.60kb.
这些相同的客户还收到其他一些数据,例如帐户余额和目录中的产品.我估计任何时候都有大约1000条产品记录处于活动状态.
I also have some other data being received by these same customers like account balances and products from a catalogue. I estimate about 1,000 product records active at any one time.
基于上述情况,我一年中大约有8,392,475,0,00(84亿)个文档,总共20,145,450,000 kb(18.76Tb)的数据存储在一个集合中.
Based upon the above, I approximate 8,392,475,0,00 (8.4 billion) documents in a single year with a total of 20,145,450,000 kb (18.76Tb) of data being stored in a collection.
基于MongoDb收集的32Tb(34,359,738,368 kb)的容量,我相信它将达到容量的58.63%.
Based upon the capacity of a MongoDb collection of 32Tb (34,359,738,368 kb) I believe it would be at 58.63% of capacity.
我想了解它对于在其上运行的不同聚合查询将如何执行.我想创建一组分阶段的管道聚合,将这些聚合写入不同的集合中,这些集合将用作业务洞察分析的源数据.
I want to understand how this will perform for different aggregation queries running on it. I want to create a set of staged pipeline aggregations which write to a different collection which are used as source data for business insights analysis.
在84亿个交易文档中,我旨在通过使用$out
输出的一组单独服务在不同的集合中创建此聚合数据,从而避免单个结果集的16Mb文档大小出现任何问题.
Across 8.4 billion transactional documents, I aim to create this aggregated data in a different collection by a set of individual services which output using $out
to avoid any issues with the 16Mb document size for a single results set.
我在这里是否过于雄心勃勃,期望MongoDb能够:
Am I being overly ambitious here expection MongoDb to be able to:
- 将大量数据存储在集合中
- 汇总并输出刷新数据的结果,以在单独的集合中推动业务洞察力,以供服务使用,这些服务可提供客户业务的不同方面
欢迎任何反馈,我想了解与其他用于数量数据存储和使用的技术相比,使用MongoDb的局限性.
Any feedback welcome, I want to understand where the limit is of using MongoDb as opposed to other technologies for quantity data storage and use.
预先感谢
推荐答案
对MongoDB中的大集合(在副本集或分片群集中)没有限制.我认为您将其与无法分割的最大集合大小混淆了.
There is no limit on how big collection in MongoDB can be (in a replica set or a sharded cluster). I think you are confusing this with maximum collection size after reaching which it cannot be sharded.
对于您计划拥有的数据量,从一开始就使用分片群集是有意义的.
For the amount of data you are planning to have it would make sense to go with a sharded cluster from the beginning.
这篇关于MongoDB聚合性能功能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!