mongodb中集合的最大大小是多少 [英] What is the max size of collection in mongodb

查看:3467
本文介绍了mongodb中集合的最大大小是多少的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道mongodb中集合的最大大小是多少. 在mongodb限制文档中,提到单个MMAPv1数据库的最大大小为32TB.

I would like to know what is the max size of collection in mongodb. In mongodb limitations documentation it is mentioned single MMAPv1 database has a maximum size of 32TB.

这意味着收集的最大大小为32TB? 如果我要在一个集合中存储超过32TB的存储空间,那么解决方案是什么?

This means max size of collection is 32TB? If I want to store more than 32TB in one collection what is the solution?

推荐答案

有一些理论上的限制,如下所述,但即使下限也很高.正确地计算极限并不容易,但是数量级应该足够.

There are theoretical limits, as I will show below, but even the lower bound is pretty high. It is not easy to calculate the limits correctly, but the order of magnitude should be sufficient.

实际限制取决于分片名称的长度之类的东西(如果您有成千上万的分片,则总和),但这是对真实数据的粗略计算.

The actual limit depends on a few things like length of shard names and alike (that sums up if you have a couple of hundred thousands of them), but here is a rough calculation with real life data.

每个分片在config db中需要一些空间,与其他任何数据库一样,它在单台计算机或副本集中的空间限制为32TB.在我管理的服务器上,config.shards中条目的平均大小为112字节.此外,每个块需要约250字节的元数据信息.让我们假设最佳的块大小接近64MB.

Each shard needs some space in the config db, which is limited as any other database to 32TB on a single machine or in a replica set. On the servers I administrate, the average size of an entry in config.shards is 112 bytes. Furthermore, each chunk needs about 250 bytes of metadata information. Let us assume optimal chunk sizes of close to 64MB.

每个服务器最多可以有500,000个块. 500,000 * 250byte等于每个分片的块信息125MB.因此,每个分片,如果我们将所有内容最大化,则每个分片有125.000112 MB.用该值除以32TB可以看出,一个集群中最多可以有256,000个分片.

We can have at maximum 500,000 chunks per server. 500,000 * 250byte equals 125MB for the chunk information per shard. So, per shard, we have 125.000112 MB per shard if we max everything out. Dividing 32TB by that value shows us that we can have a maximum of slightly under 256,000 shards in a cluster.

每个分片又可以容纳32TB的数据. 256,000 * 32TB为8.19200 EB或8,192,000 TB.那将是我们示例的限制.

Each shard in turn can hold 32TB worth of data. 256,000 * 32TB is 8.19200 exabytes or 8,192,000 terabytes. That would be the limit for our example.

让我们说它的8艾字节.截至目前,这可以轻松转换为足够用于所有实际目的".给您留下深刻的印象:国会图书馆(按馆藏大小可以说是世界上最大的图书馆之一)拥有的所有数据,估计包含音频,视频和数字资料的数据大小约为20TB.您可以将其装入理论上的MongoDB集群约40万次.请注意,这是使用保守值得出的最大尺寸的下限.

Let's say its 8 exabytes. As of now, this can easily translated to "Enough for all practical purposes". To give you an impression: All data held by the Library of Congress (arguably one of the biggest library in the world in terms of collection size) holds an estimated size of data of around 20TB in size including audio, video, and digital materials. You could fit that into our theoretical MongoDB cluster some 400,000 times. Note that this is the lower bound of the maximum size, using conservative values.

现在好了:WiredTiger存储引擎没有此限制:数据库大小不受限制(因为可以使用多少个数据文件没有限制),因此我们可以拥有无​​限数量的分片.即使当我们在mmapv1上运行这些分片,而仅在WT上运行我们的配置服务器时,a的大小也变得几乎不受限制– 64位系统上对16.8M TB RAM的限制可能会在某处引起问题并导致集合要交换到磁盘上,使系统停顿.我只能猜测,因为我的计算器拒绝使用该区域中的数字(而且我懒得手工做),但是我估计这里的限制在两位数的yottabyte区域(以及在某个地方托管该空间所需的空间)在得克萨斯州的大小).

Now for the good part: The WiredTiger storage engine does not have this limitation: The database size is not limited (since there is no limit on how many datafiles can be used), so we can have an unlimited number of shards. Even when we have those shards running on mmapv1 and only our config servers on WT, the size of a becomes nearly unlimited – the limitation to 16.8M TB of RAM on a 64 bit system might cause problems somewhere and cause the indices of the config.shard collection to be swapped to disk, stalling the system. I can only guess, since my calculator refuses to work with numbers in that area (and I am too lazy to do it by hand), but I estimate the limit here in the two digit yottabyte area (and the space needed to host that somewhere in the size of Texas).

不要担心分片环境中的最大数据大小.无论如何,即使采用最保守的方法,也已足够.使用分片,您就完成了.顺便说一句:甚至32TB的数据也是非常可怕的:我知道的大多数群集持有的数据和碎片更少,因为IOPS和RAM利用率超过了单个节点的容量.

Do not worry about the maximum data size in a sharded environment. No matter what, it is by far enough, even with the most conservative approach. Use sharding, and you are done. Btw: even 32TB is a hell lot of data: Most clusters I know hold less data and shard because the IOPS and RAM utilization exceeded a single nodes capacity.

这篇关于mongodb中集合的最大大小是多少的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆