MongoDB 总存储大小 [英] Total MongoDB storage size

查看:125
本文介绍了MongoDB 总存储大小的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个分片和复制的 MongoDB,其中包含数百万条记录.我知道 Mongo 使用一些填充因子写入数据,以允许快速更新,我也知道复制数据库 Mongo 应该存储需要一些(实际上,很多)空间的操作日志.即使有了这些知识,我也不知道如何在给定典型数据库记录大小的情况下估计 Mongo 所需的实际大小.到现在为止,每周维修之间的差异系数为 2 - 3.

I have a sharded and replicated MongoDB with dozens millions of records. I know that Mongo writes data with some padding factor, to allow fast updates, and I also know that to replicate the database Mongo should store operation log which requires some (actually, a lot of) space. Even with that knowledge I have no idea how to estimate the actual size required by Mongo given a size of a typical database record. By now I have a descrepancy with a factor of 2 - 3 between weekly repairs.

那么问题是:在给定平均记录大小(以字节为单位)的情况下,如何估计 MongoDB 所需的总存储大小?

So the question is: How to estimate a total storage size required by MongoDB given an average record size in bytes?

推荐答案

简短的回答是:你不能,不能仅仅基于 avg.文档大小(至少不是以任何准确的方式).

The short answer is: you can't, not based solely on avg. document size (at least not in any accurate way).

更详细地解释:

所需的磁盘空间不仅仅是平均文档大小的函数.您创建的任何索引也需要空间.然后,如果您确实触发了这些移动,则需要空间(尽管有填充,但确实会发生这种情况)- 该空间被放置在要重新使用的列表中,但取决于您随后插入的数据,可能会也可能不会重复使用那个空间.

The space needed on disk is not simply a function of the average document size. There is also the space needed for any indexes you create. Then there is the space needed if you do trigger those moves (despite padding, this does happen) - that space is placed on a list to be re-used but depending on the data you subsequently insert, it may or may not be possible to re-use that space.

您还可以补充一个事实,即预分配意味着偶尔会在分配新数据文件时少量文档将您的磁盘空间利用率增加约 2GB.当然,在数据充足的情况下,这本质上是一个舍入误差,但值得牢记.

You can also add into the fact that pre-allocation will mean that occasionally a handful of documents will increase your on-disk space utilization by ~2GB as a new data file is allocated. Of course, with sufficient data, this will be essentially a rounding error but it is worth bearing in mind.

在假设使用模式一致的情况下,估计此类数据与大小比率的唯一方法是针对您的特定用例随着时间的推移对其进行趋势分析,并跟踪磁盘空间使用情况与插入的数据(文档数量可能更好)比数据量取决于文档大小的可变性).

The only way to estimate this type of data to size ratio, assuming a consistent usage pattern, is to trend it over time for your particular use case and track the disk space usage versus the data inserted (number of documents might be better than data volume depending on variability of doc size).

同样,如果您跟踪插入率、文档大小和从重新同步/修复中获得的空间.仅供参考 - 您可以从头开始重新同步辅助副本以获得数据文件的新"副本,而不是运行修复,这可以减少破坏性,并根据您的设置使用更少的空间.

Similarly, if you track the insertion rate, doc size and the space gained back from a resync/repair. FYI - you can resync a secondary from scratch to get a "fresh" copy of the data files rather than running a repair, which can be less disruptive, and use less space depending on your set up.

这篇关于MongoDB 总存储大小的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆