MongoDB:文档大小会影响查询性能吗? [英] MongoDB: does document size affect query performance?

查看:130
本文介绍了MongoDB:文档大小会影响查询性能吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设有一个由 MongoDB 数据库支持的手机游戏,该数据库包含一个包含数百万个文档的 User 集合.

Assume a mobile game that is backed by a MongoDB database containing a User collection with several million documents.

现在假设必须与用户相关联的几十个属性 - 例如Friend 文档的 _id 值数组,他们的用户名,照片,Game 文档的 _id 值数组, last_login 日期,游戏币数量等等等等.

Now assume several dozen properties that must be associated with the user - e.g. an array of _id values of Friend documents, their username, photo, an array of _id values of Game documents, last_login date, count of in-game currency, etc, etc, etc..

我担心的是,在数百万个用户文档上创建和更新不断增长的大型数组是否会增加每个用户文档的权重",和/或整个系统的运行速度.

My concern is whether creating and updating large, growing arrays on many millions of User documents will add any 'weight' to each User document, and/or slowness to the overall system.

我们可能永远不会让每个文档超过 16mb,但我们可以肯定地说,如果我们直接存储这些不断增长的列表,我们的文档会大 10-20 倍.

We will likely never eclipse 16mb per document, but we can safely say our documents will be 10-20x larger if we store these growing lists directly.

问题:这甚至是 MongoDB 中的问题吗?如果您的查询使用投影和索引等正确管理,文档大小是否重要?我们是否应该积极修剪文档大小,例如引用外部列表还是直接嵌入 _id 值列表?

Question: is this even a problem in MongoDB? Does document size even matter if your queries are properly managed using projection and indexes, etc? Should we be actively pruning document size, e.g. with references to external lists vs. embedding lists of _id values directly?

换句话说:如果我想要一个用户的 last_login 值,如果我的 User<,那么项目/选择仅 last_login 字段的查询是否会有所不同?/code> 文档是 100kb 还是 5mb?

In other words: if I want a user's last_login value, will a query that projects/selects only the last_login field be any different if my User documents are 100kb vs. 5mb?

或者:如果我想找到具有特定 last_login 值的所有用户,文档大小会影响那种查询吗?

Or: if I want to find all users with a specific last_login value, will document size affect that sort of query?

推荐答案

首先,您应该花一点时间阅读 MongoDB 如何根据填充因子和 powerof2sizes 分配来存储文档:

First of all you should spend a little time reading up on how MongoDB stores documents with reference to padding factors and powerof2sizes allocation:

http://docs.mongodb.org/manual/core/storage/http://docs.mongodb.org/manual/reference/command/collStats/#collStats.填充因子

简单地说,MongoDB 在存储原始文档时会尝试分配一些额外的空间以允许增长.Powerof2sizes 分配成为 2.6 版本的默认方法,它将以 2 的幂增加文档大小.

Put simply MongoDB tries to allocate some additional space when storing your original document to allow for growth. Powerof2sizes allocation became the default approach in version 2.6, where it will grow the document size in powers of 2.

总的来说,如果所有更新都适合原始大小分配,性能会好得多.原因是,如果他们不这样做,则整个文档需要移动到其他有足够空间的地方,从而导致更多的读取和写入,并实际上使您的存储碎片化.

Overall, performance will be much better if all updates fit within the original size allocation. The reason is that if they don't, the entire document needs to be moved someplace else with enough space, causing more reads and writes and in effect fragmenting your storage.

如果您的文档大小真的会增加 10 倍到 20 倍,这可能意味着每个文档需要多次移动,这取决于您的插入、更新和读取频率,这可能会导致问题.如果是这种情况,您可以考虑以下几种方法:

If your documents are really going to grow in size by a factor of 10X to 20X overtime that could mean multiple moves per document, which depending on your insert, update and read frequency could cause issues. If that is the case there are a couple of approaches you can consider:

1) 在初始插入时分配足够的空间以覆盖大部分(假设 90%)正常文档生命周期增长.虽然这在开始时空间使用效率低下,但随着文档的增长,效率会随着时间的推移而提高,而不会降低性能.实际上,您将提前为存储付费,以后最终会使用这些存储来随着时间的推移获得良好的性能.

1) Allocate enough space on initial insertion to cover most (let's say 90%) of normal documents lifetime growth. While this will be inefficient in space usage at the beginning, efficiency will increase with time as the documents grow without any performance reduction. In effect you will pay ahead of time for storage that you will eventually use later to get good performance over time.

2) 创建溢出"文档 - 假设适用典型的 80-20 规则,并且 80%​​ 的文档适合特定尺寸.分配该数量并添加一个溢出集合,例如,如果他们有超过 100 个朋友或 100 个游戏文档,则您的文档可以指向该集合.溢出字段指向这个新集合中的一个文档,如果溢出字段存在,您的应用程序只会在新集合中查找.允许 80% 的用户进行正常的文档处理,并避免在 80% 不需要的用户文档上浪费大量存储空间,但会增加应用程序的复杂性.

2) Create "overflow" documents - let's say a typical 80-20 rule applies and 80% of your documents will fit in a certain size. Allocate for that amount and add an overflow collection that your document can point to if they have more than 100 friends or 100 Game documents for example. The overflow field points to a document in this new collection and your app only looks in the new collection if the overflow field exists. Allows for normal document processing for 80% of the users, and avoids wasting a lot of storage on the 80% of user documents that won't need it, at the expense of additional application complexity.

在任何一种情况下,我都会考虑通过构建适当的索引来使用覆盖查询:

In either case I'd consider using covered queries by building the appropriate indexes:

覆盖查询是这样一种查询:

A covered query is a query in which:

all the fields in the query are part of an index, and
all the fields returned in the results are in the same index.

因为索引覆盖"了查询,MongoDB 都可以匹配查询条件并仅使用索引返回结果;MongoDB 可以不需要看文档,只需要看索引,就可以完成查询.

Because the index "covers" the query, MongoDB can both match the query conditions and return the results using only the index; MongoDB does not need to look at the documents, only the index, to fulfill the query.

只查询索引可以比查询文档快很多指数之外.索引键通常小于他们编目的文档和索引通常在 RAM 或顺序位于磁盘上.

Querying only the index can be much faster than querying documents outside of the index. Index keys are typically smaller than the documents they catalog, and indexes are typically available in RAM or located sequentially on disk.

这里有更多关于这种方法的信息:http://docs.mongodb.org/手册/教程/create-indexes-to-support-queries/

More on that approach here: http://docs.mongodb.org/manual/tutorial/create-indexes-to-support-queries/

这篇关于MongoDB:文档大小会影响查询性能吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆