MongoDB(noSQL)何时拆分集合 [英] MongoDB (noSQL) when to split collections

查看:77
本文介绍了MongoDB(noSQL)何时拆分集合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以我正在用NodeJS& ExpressJS的.这是我第一次使用像MongoDB这样的noSQL数据库,并且试图弄清楚如何修复我的数据模型.

So I'm writing an application in NodeJS & ExpressJS. It's my first time I'm using a noSQL database like MongoDB and I'm trying to figure out how to fix my data model.

在我们的项目开始时,我们已经记录了关系数据库中的所有内容,但是由于我们最近从该项目的Laravel切换到ExpressJS,所以我对处理所有不同的表格布局有些困惑.

At start for our project we have written down everything in relationship database terms but since we recently switched from Laravel to ExpressJS for our project I'm a bit stuck on what to do with all my different tables layouts.

到目前为止,我已经发现对您的方案进行非规范化更好,但是它必须在某个地方结束,对吗?最后,您可以将整个数据存储在一个集合中.好吧,虽然不令人费解,但您明白了.

So far I have figured out it's better to denormalize your scheme but it does have to end somewhere, right? In the end you can end up storing your whole data in one collection. Well, not enterily but you get the point.

1.因此,是否有规则标准定义了在哪里切割以进行多个收藏? 我有一个与用户(都是客户或商店用户),商店,产品,购买,类别,子类别..

1. So is there a rule or standard that defines where to cut to make multiple collections? I'm having a relation database with users (which are both a client or a store user), stores, products, purchases, categories, subcategories ..

2.在noSQL数据库中定义关系是否不好? 就像每个产品都有一个类别,但我想通过一个ID关联该类别(父级在MongoDB中是工作),但这是一件坏事吗?还是在这里选择性能还是结构?

2. Is it bad to define a relationship in a noSQL database? Like every product has a category but I want to relate to the category by an id (parent does the job in MongoDB) but is it a bad thing? Or is this where you choose performance vs structure?

3.是否将noSQL/MongoDB语句用于具有很大关系(如果它们是在MySQL中创建的)的大型数据库?

预先感谢

推荐答案

如前所述,没有像SQL的第二种普通形式这样的规则.

As already written, there are no rules like the second normal form for SQL.

但是,我将在此列出与MongoDB优化相关的一些最佳实践和常见陷阱.

However, there are some best practices and common pitfalls related to optimization for MongoDB which I will list here.

与普遍认为相反,参考文献没有错.假设您有一个图书馆,并且想跟踪租金.您可以从这样的模型开始

Contrary to popular believe, there is nothing wrong with references. Assume you have a library of books, and you want to track the rentals. You could begin with a model like this

{
  // We use ISBN for its uniqueness 
  _id: "9783453031456"
  title: "Schismatrix",
  author: "Bruce Sterling",
  rentals: [
    {
      name:"Markus Mahlberg,
      start:"2015-05-05T03:22:00Z",
      due:"2015-05-12T12:00:00Z"
    }
  ]
}

尽管此模型存在多个问题,但最重要的问题并不明显-由于BSON文档的大小限制为16MB,因此租赁数量有限.

While there are several problems with this model, the most important isn't obvious – there will be a limited number of rentals because of the fact that BSON documents have a size limit of 16MB.

将租金存储在阵列中的另一个问题是,这将导致相对频繁的文档迁移,这是一项相当昂贵的操作. BSON文档永远不会被分区和创建,它们在增长时会事先分配一些额外的空间.这个额外的空间称为padding.当超出填充范围时,文档将移动到数据文件中的另一个位置,并分配新的填充空间.因此,频繁添加数据会导致频繁的文档迁移. 因此,最好的做法是防止频繁的更新增加文档的大小,而改用引用.

The other problem with storing rentals in an array would be that this would cause relatively frequent document migrations, which is a rather costly operation. BSON documents are never partitioned and created with some additional space allocated in advance used when they grow. This additional space is called padding. When the padding is exceeded, the document is moved to another location in the datafiles and new padding space is allocated. So frequent additions of data cause frequent document migrations. Hence, it is best practice to prevent frequent updates increasing the size of the document and use references instead.

因此,在该示例中,我们将更改单个模型并创建第二个模型.首先,这本书的模型

So for the example, we would change our single model and create a second one. First, the model for the book

{
  _id: "9783453031456",
  title:"Schismatrix",
  author: "Bruce Sterling"
}

第二个出租模式如下:

{
  _id: new ObjectId(),
  book: "9783453031456",
  rentee: "Markus Mahlberg",
  start: ISODate("2015-05-05T03:22:00Z"),
  due: ISODate("2015-05-05T12:00:00Z"),
  returned: ISODate("2015-05-05T11:59:59.999Z")
}

作者或承租人当然可以使用相同的方法.

The same approach of course could be used for author or rentee.

让我们回头看看.开发人员将识别涉及业务案例的实体,定义它们的属性和关系,编写相应的实体类,将头撞在墙上几个小时,以达到所需的三层内外以上的工作对于用例,此后所有人都过着幸福的生活.那么,为什么一般都使用NoSQL,尤其是MongoDB?因为从此以后没有人过着幸福的生活.这种方法可怕地缩放,几乎唯一的缩放方法是垂直缩放.

Let's look back some time. A developer would identify the entities involved into a business case, define their properties and relations, write the according entity classes, bang his head against the wall for a few hours to get the triple inner-outer-above-and-beyond JOIN working required for the use case and all lived happily ever after. So why use NoSQL in general and MongoDB in particular? Because nobody lived happily ever after. This approach scales horribly and almost exclusively the only way to scale is vertical.

但是NoSQL的主要区别在于,您可以根据需要回答的问题对数据进行建模.

But the main difference of NoSQL is that you model your data according to the questions you need to get answered.

话虽如此,让我们看一个典型的n:m关系,并以作者与书籍之间的关系为例.在SQL中,您有3个表格:两个用于您的实体( authors ),另一个用于关系(哪本书的作者是谁? ).当然,您可以使用这些表并创建它们的等效集合.但是,由于MongoDB中没有JOIN,因此需要三个查询(一个查询用于第一个实体,一个查询用于其关系,一个查询用于相关实体)以查找实体的相关文档.这是没有道理的,因为n:m关系的三表方法是专门为克服SQL数据库强制执行的严格模式而发明的. 由于MongoDB具有灵活的架构,因此第一个问题是将关系存储在何处,同时牢记因过度使用嵌入而引起的问题.由于未来几年作者可能会写很多本书,但由于一本书的作者身份很少(甚至根本没有变化),答案很简单:我们将作者存储为书籍数据中的作者参考

That being said, let's look at a typical n:m relation and take the relation from authors to books as our example. In SQL, you'd have 3 tables: two for your entities (books and authors) and one for the relation (Who is the author of which book?). Of course, you could take those tables and create their equivalent collections. But, since there are no JOINs in MongoDB, you'd need three queries (one for the first entity, one for its relations and one for the related entities) to find the related documents of an entity. This wouldn't make sense, since the three table approach for n:m relations was specifically invented to overcome the strict schemas SQL databases enforce. Since MongoDB has a flexible schema, the first question would be where to store the relation, keeping the problems arising from overuse of embedding in mind. Since an author might write quite a few books in the years coming, but the authorship of a book rarely, if at all, changes, the answer is simple: We store the authors as a reference to the authors in the books data

{
  _id: "9783453526723",
  title: "The Difference Engine",
  authors: ["idOfBruceSterling","idOfWilliamGibson"]
}

现在我们可以通过两个查询来找到这本书的作者:

And now we can find the authors of that book by doing two queries:

var book = db.books.findOne({title:"The Difference Engine"})
var authors = db.authors.find({_id: {$in: book.authors})

我希望以上内容可以帮助您决定何时真正拆分"您的收藏并避开最常见的陷阱.

I hope the above helps you to decide when to actually "split" your collections and to get around the most common pitfalls.

关于您的问题,这是我的答案

As to your questions, here are my answers

  1. 如前所述:,但是请牢记技术限制应在可行的情况下为您提供一个思路.
  2. 这还不错–只要适合您的用例.如果您具有给定的类别及其_id,则可以轻松找到相关产品.加载产品时,您可以轻松获得产品所属的类别,甚至可以高效地获得它的类别,因为默认情况下会索引_id.
  3. 我还没有找到用MongoDB无法完成的用例,尽管使用MongoDB可能会使事情变得更加复杂.您应该做的是总结功能需求和非功能需求的总和,并检查优势是否大于劣势.我的经验法则:如果您的需求列表中包含可扩展性"或高可用性/自动故障转移"之一,那么MongoDB不仅值得一看.
  1. As written before: No, but keeping the technical limitations in mind should give you an idea when it could make sense.
  2. It is not bad – as long as it fits your use case(s). If you have a given category and its _id, it is easy to find the related products. When loading the product, you can easily get the categories it belongs to, even efficiently so, as _id is indexed by default.
  3. I have yet to find a use case which can't be done with MongoDB, though some things can get a bit more complicated with MongoDB. What you should do imho is to take the sum of your functional and non functional requirements and check wether the advantages outweigh the disadvantages. My rule of thumb: if one of "scalability" or "high availability/automatic failover" is on your list of requirements, MongoDB is worth more than a look.

这篇关于MongoDB(noSQL)何时拆分集合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆