更新Mongodb中的嵌入式文档:性能问题? [英] Update embedded document in Mongodb: Performance issue?

查看:44
本文介绍了更新Mongodb中的嵌入式文档:性能问题?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是Mongodb的新手,听说Mongodb非常适合进行大量的读写操作. 嵌入式文档是实现它的功能之一.但是我不确定这是否也是性能问题的原因. 图书文档示例:

{
    "_id": 1,
    "Authors": [
        {
            "Email": "email",
            "Name": "name"
        }
    ],
    "Title": "title",
    ...
}

如果一位作者有成千上万的书,并且他的电子邮件需要更新,我需要写一些查询,

  1. 搜索所有书籍文档,并与该作者挑选出数千本书
  2. 在这些图书文档中更新作者的电子邮件字段

这些操作似乎效率不高.但是这种类型的更新无处不在,我相信开发人员已经考虑了这一点.那么,我在哪里弄错了?

解决方案

您当前的嵌入式架构设计有其优点,其中之一就是数据局部性.由于MongoDB将数据连续存储在磁盘上,因此将所需的所有数据放在一个文档中可确保旋转的磁盘将花费更少的时间查找磁盘上的特定位置.

如果您的应用程序经常访问books信息和Authors数据,那么您几乎肯定会想走嵌入式路线.嵌入式文档的另一个优点是写入数据时的原子性和隔离性.

为了说明这一点,假设您希望一位作者的所有书籍都更新他的电子邮件字段,这可以通过一个(原子)操作来完成,这对MongoDB而言不是性能问题:

db.books.updateMany(
    { "Authors.name": "foo" },
    {
        "$set": { "Authors.$.email": "new@email.com" }
    }
);

或与较早版本的MongoDB:

db.books.update(
    { "Authors.name": "foo" },
    {
        "$set": { "Authors.$.email": "new@email.com" }
    },
    { "multi": true }
)

在上面,您使用位置$运算符通过标识要更新的数组中的元素而无需显式指定该元素在数组中的位置,从而促进了对包含嵌入式文档的数组的更新.与点符号一起使用>运算符.

有关MongoDB中数据建模的更多详细信息,请阅读文档数据建模简介,尤其是模型与嵌入式文档一对多的关系.


您可以考虑的另一个设计选项是在遵循规范化架构的地方引用文档.例如:

// db.books schema
{
    "_id": 3
    "authors": [1, 2, 3] // <-- array of references to the author collection
    "title": "foo"
}

// db.authors schema
/*
1
*/
{
    "_id": 1,    
    "name": "foo",
    "surname": "bar",
    "address": "xxx",
    "email": "foo@mail.com"
}
/*
2
*/
{
    "_id": 2,    
    "name": "abc",
    "surname": "def",
    "address": "xyz",
    "email": "abc@mail.com"
}
/*
3
*/
{
    "_id": 3,    
    "name": "alice",
    "surname": "bob",
    "address": "xyz",
    "email": "alice@mail.com"
}

当您使用非常不可预测的关系建立一对多关系时,上述使用文档引用方法的规范化架构也具有优势.如果每个给书实体有成百上千的作者文档,那么就空间限制而言,嵌入会有很多挫折,因为文档越大,使用的RAM越多,而MongoDB文档的硬大小限制为16MB. /p>

对于查询规范化架构,可以考虑使用聚合框架的 $lookup 运算符,该运算符对同一数据库中的authors集合执行左外部联接,以从books集合中过滤文档以进行处理.


因此,我相信您的当前模式比创建单独的authors集合更好的方法,因为单独的集合需要更多的工作,即查找一本书及其作者是两个查询,并且需要额外的工作,而上面的模式嵌入了文档既简单又快速(单次搜索).插入和更新没有太大差异.因此,如果您需要选择单个文档,需要对查询进行更多控制或拥有庞大的文档,则单独的集合是不错的选择.当您需要整个文档时,嵌入式文档也很好,该文档带有 authors的nofollow> $slice ,或者根本没有authors.

一般的经验法则是,如果您的应用程序的查询模式是众所周知的,并且数据倾向于仅以一种方式访问​​,则嵌入式方法会很好地工作.如果您的应用程序以多种方式查询数据,或者您无法预期数据查询模式,那么更规范的文档引用模型将适合这种情况.

参考:

MongoDB应用的设计模式:领先的NoSQL数据库的实际使用案例Rick Copeland

I am new to Mongodb and I heard that Mongodb is good for massive amount of read and write operations. Embedded document is one of the features that make it happen. But I am not sure if it is also a cause of performance issue. Book document example:

{
    "_id": 1,
    "Authors": [
        {
            "Email": "email",
            "Name": "name"
        }
    ],
    "Title": "title",
    ...
}

If there are thousands of books by one author, and his email needs to be updated, I need to write some query which can

  1. search through all book documents, pick out those thousands ones with this author
  2. update author's email field across these book documents

These operations do not seem efficient. But this type of update is ubiquitous, I believe the developers have considered this. So, where did I get it wrong?

解决方案

Your current embedded schema design has its merits, one of them being data locality. Since MongoDB stores data contiguously on disk, putting all the data you need in one document ensures that the spinning disks will take less time to seek to a particular location on the disk.

If your application frequently accesses books information along with the Authors data then you'll almost certainly want to go the embedded route. The other advantage with embedded documents is the atomicity and isolation in writing data.

To illustrate this, say you want all books by one author have his email field updated, this can be done with one single (atomic) operation, which is not a performance issue with MongoDB:

db.books.updateMany(
    { "Authors.name": "foo" },
    {
        "$set": { "Authors.$.email": "new@email.com" }
    }
);

or with earlier MongoDB versions:

db.books.update(
    { "Authors.name": "foo" },
    {
        "$set": { "Authors.$.email": "new@email.com" }
    },
    { "multi": true }
)

In the above, you use the positional $ operator which facilitates updates to arrays that contain embedded documents by identifying an element in an array to update without explicitly specifying the position of the element in the array. Use it with the dot notation on the $ operator.

For more details on data modelling in MongoDB, please read the docs Data Modeling Introduction, especically Model One-to-Many Relationships with Embedded Documents.


The other design option which you can consider is referencing documents where you follow a normalized schema. For example:

// db.books schema
{
    "_id": 3
    "authors": [1, 2, 3] // <-- array of references to the author collection
    "title": "foo"
}

// db.authors schema
/*
1
*/
{
    "_id": 1,    
    "name": "foo",
    "surname": "bar",
    "address": "xxx",
    "email": "foo@mail.com"
}
/*
2
*/
{
    "_id": 2,    
    "name": "abc",
    "surname": "def",
    "address": "xyz",
    "email": "abc@mail.com"
}
/*
3
*/
{
    "_id": 3,    
    "name": "alice",
    "surname": "bob",
    "address": "xyz",
    "email": "alice@mail.com"
}

The above normalized schema using document reference approach also has an advantage when you have one-to-many relationships with very unpredictable arity. If you have hundreds or thousands of author documents per give book entity, embedding has so many setbacks in as far as spacial constraints are concerned because the larger the document, the more RAM it uses and MongoDB documents have a hard size limit of 16MB.

For querying a normalized schema, you can consider using the aggregation framework's $lookup operator which performs a left outer join to the authors collection in the same database to filter in documents from the books collection for processing.


Thus said, I believe your current schema is a better approach than creating a separate collection of authors since separate collections require more work i.e. finding an book + its authors is two queries and requires extra work whereas the above schema embedded documents are easy and fast (single seek). There are no big differences for inserts and updates. So, separate collections are good if you need to select individual documents, need more control over querying, or have huge documents. Embedded documents are also good when you want the entire document, the document with a $slice of the embedded authors, or with no authors at all.

The general rule of thumb is that if your application's query pattern is well-known and data tends to be accessed only in one way, an embedded approach works well. If your application queries data in many ways or you unable to anticipate the data query patterns, a more normalized document referencing model will be appropriate for such case.

Ref:

MongoDB Applied Design Patterns: Practical Use Cases with the Leading NoSQL Database By Rick Copeland

这篇关于更新Mongodb中的嵌入式文档:性能问题?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆