MongoDB聚合性能功能 [英] MongoDB aggregation performance capability

查看:110
本文介绍了MongoDB聚合性能功能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试针对将MongoDb用于大量要用于各种聚合的大量文档的性能问题进行研究.

I am trying to work through some performance considerations about using MongoDb for a considerable amount of documents to be used in a variety of aggregations.

我已经了解到,集合的容量为32TB,这取决于块和分片键值的大小.

I have read that a collection has 32TB capcity depending on the sizes of chunk and shard key values.

如果我有65,000个客户,每个客户每天平均向我们提供350个销售交易,那么最终每天将创建约22,750,000个文档.当我说一个销售交易时,我的意思是一个对象,它类似于带有标题和行项目的发票.我拥有的每个文档平均为2.60kb.

If I have 65,000 customers who each supply to us (on average) 350 sales transactions per day, that ends up being about 22,750,000 documents getting created daily. When I say a sales transaction, I mean an object which is like an invoice with a header and line items. Each document I have is an average of 2.60kb.

这些相同的客户还收到其他一些数据,例如帐户余额和目录中的产品.我估计任何时候都有大约1000条产品记录处于活动状态.

I also have some other data being received by these same customers like account balances and products from a catalogue. I estimate about 1,000 product records active at any one time.

基于上述情况,我一年中大约有8,392,475,0,00(84亿)个文档,总共20,145,450,000 kb(18.76Tb)的数据存储在一个集合中.

Based upon the above, I approximate 8,392,475,0,00 (8.4 billion) documents in a single year with a total of 20,145,450,000 kb (18.76Tb) of data being stored in a collection.

基于MongoDb收集的32Tb(34,359,738,368 kb)的容量,我相信它将达到容量的58.63%.

Based upon the capacity of a MongoDb collection of 32Tb (34,359,738,368 kb) I believe it would be at 58.63% of capacity.

我想了解它对于在其上运行的不同聚合查询将如何执行.我想创建一组分阶段的管道聚合,将这些聚合写入不同的集合中,这些集合将用作业务洞察分析的源数据.

I want to understand how this will perform for different aggregation queries running on it. I want to create a set of staged pipeline aggregations which write to a different collection which are used as source data for business insights analysis.

在84亿个交易文档中,我旨在通过使用$out输出的一组单独服务在不同的集合中创建此聚合数据,从而避免单个结果集的16Mb文档大小出现任何问题.

Across 8.4 billion transactional documents, I aim to create this aggregated data in a different collection by a set of individual services which output using $out to avoid any issues with the 16Mb document size for a single results set.

我在这里是否过于雄心勃勃,期望MongoDb能够:

Am I being overly ambitious here expection MongoDb to be able to:

  1. 将大量数据存储在集合中
  2. 汇总并输出刷新数据的结果,以在单独的集合中推动业务洞察力,以供服务使用,这些服务可提供客户业务的不同方面

欢迎任何反馈,我想了解与其他用于数量数据存储和使用的技术相比,使用MongoDb的局限性.

Any feedback welcome, I want to understand where the limit is of using MongoDb as opposed to other technologies for quantity data storage and use.

预先感谢

推荐答案

对MongoDB中的大集合(在副本集或分片群集中)没有限制.我认为您将其与无法分割的最大集合大小混淆了.

There is no limit on how big collection in MongoDB can be (in a replica set or a sharded cluster). I think you are confusing this with maximum collection size after reaching which it cannot be sharded.

MongoDB文档:分片操作限制

对于您计划拥有的数据量,从一开始就使用分片群集是有意义的.

For the amount of data you are planning to have it would make sense to go with a sharded cluster from the beginning.

这篇关于MongoDB聚合性能功能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆