MongoDB - 儿童和父母结构 [英] MongoDB - children and parent structure

查看:105
本文介绍了MongoDB - 儿童和父母结构的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

最近刚刚用MongoDB深入到了NoSQL的世界,我仍然在努力去理解建筑的最佳方法,而无需对数据进行规范化,然后加入。目前,我正在设计的项目是一个简单的文章集,类似于wiki。一篇文章将有一个标题和文字,以及(可能)一个父母文章和一个或多个儿童文章。



我对数据库设计有许多不同的想法,并希望选择与MongoDB最为相符的一个。



Idea One



由于数据库上最常见的查询类型将始终只是检索文章,我正在嵌入页面所需的所有相关数据,以显示所有内容。实际的文章当然还有一个带有 url 的父子文件(它将匹配一些其他文档的_id)以及标题,这是文本我们将在屏幕上打印出标签内部。 儿童中存在一个标识结构,除了它是一个数组,以便所有小孩在那里。

  {
_id:test-article-2,
title:Test article 2,
text Bla 2,
parent:{
title:Test Article,
url:test-article
},
儿童:[
{
title:Test article 3,
url:test-article-3
}
]
}

这种类型的设计似乎有速度的优势(在我看来),但我想听听这个设计的其他内容。



想法二



更多的是我习惯于从关系数据库世界来的线条。将不会将子对象嵌入到设计中,而是简单地放入其唯一的标识符。因此,现在只包含一个与其他文档的 _id 匹配的文本字符串,孩子也将具有一系列字符串,链接到 _id



为了获取所有信息以查看文章,我们现在需要进行一些查询(至少我认为我们需要...) )一个获得主要文章,然后另一个获取父母的标题放入标签,然后另一个获取所有的孩子文章,同样得到他们的标题。



这似乎是更多的查询只是为了显示文章,但它可能会使数据库的更新更容易,如果例如一些文章被删除或更新。 (再次不确定那一点)。

  {
_id:test-article-2
title:test article 2,
text:Blah 2,
parent:test-article,
children test-article-3,test-article-4]
}

很高兴听到有关MongoDB设计经验丰富的人士的投入。

解决方案

您需要考虑您需要执行的查询类型以及每种类型需要的频率。当我在类似的工作,我想出了六个可能的行动:




  • 做一些与父母

  • 与孩子做某事

  • 与祖先(父母的父母,父母的父母等的父母等)一起做某事。

  • 与子孙(儿童的孩子,孩子的孩子的孩子等)做一些事情。

  • 更改关系(添加/移动/删除层次结构中的节点)

  • 更改当前节点中的主数据(例如,更改title字段中的值)



我想估计这些对你的应用程序有多重要。



如果您的大部分工作涉及到某些给定文章(包括其直接父母和子女)的存储数据,则第一个想法是最有用的。事实上,在NongoDB中,将所有需要的信息放在同一个文档中,而不是外部引用,这样就很常见,所以只需要检索一个东西就可以使用该数据。列表中的最后四个操作更棘手。



特别是,在这种情况下,您将需要遍历树来检索祖先和后代,移动中间文档并遵循路径,即使您只能关心关于路径中的最后一个文档。对于长层次结构来说,这可能很慢。由于所有数据都存在于每个文档中,因此,更改关系可能需要在多个文档中移动大量信息。但是,即使改变像标题这样的单一领域也可能令人讨厌,因为您必须考虑这一领域存在于多个不同文档中的事实,无论是作为主要的领域还是在父或子领域。



基本上,您的第一个想法在更多的静态应用程序中效果最佳,您将不会更改数据很多,最初创建它,但你需要定期阅读它。



MongoDB文档有五种推荐方法用于处理树状(分层)结构。所有这些都具有不同的优点和缺点,尽管它们都可以在一篇文章中只需要这样做就可以轻松地更新文章中的主要数据。




  • 父参照:每个节点都包含对其父项的引用。

  • 优点

    • 快速父级查找(_id=您的文档标题查询,返回父母字段)

    • 快速儿童查询(parent查找您的文档标题,将返回所有子文档)

    • 更新关系是更改父字段

    • 更改基础数据需要更改只有一个文档


  • 缺点

    • 由祖先和后代进行搜索速度很慢,需要遍历


  • 子参考:每个节点包含一个对其子节点的引用数组

    • strong>:

      • 快速检索子代(返回子数组)

      • 快速关系更新(只需在需要的时候更新childrens数组) / li>

    • 缺点

      • 查找父母需要查看在所有文档的所有子数组中添加_id,直到找到它(因为父项将包含当前节点作为子节点)

      • 祖先&后代搜索需要遍历树



  • Ancestors数组:每个节点包含对其祖先和数组的数组的引用。其父母

    • 优点

      • 快速检索祖先(无需遍历查找具体一个)

      • 按照父参考方法轻松查找父母和孩子

      • 要查找后代,只需查看祖先,因为所有后代必须包含相同的祖先


    • 缺点

      • 需要担心保持祖先的数组以及随着关系的变化而更新的父字段,通常在多个文档之间。



  • 物化路径:每个节点包含路径自己 - 需要正则表达式

    • 优点

      • 使用正则表达式轻松找到孩子和后代

      • 可以使用路径检索父母和祖先

      • 灵活性,例如按部分路径查找节点


    • 缺点

      • 关系更改是困难的,因为他们可能需要更改多个文档/ li>


  • 嵌套集:每个节点包含一个左和右字段来帮助查找子树

    • 优点

      • 简单通过在左和右之间进行搜索,以最佳方式检索后代

      • 像父参考方法一样,很容易找到父母和孩子


    • 缺点

      • 需要遍历结构才能找到祖先

      • 关系变化比任何其他选项都执行最差,因为树中的每个单个文档都可能需要更改,以确保在层次结构中发生更改后,左和右仍然有意义/ li>




MongoDB文档中有更详细的讨论。



您的第二个想法结合了上述讨论的父参考和子参考方法。这种方法可以很容易地找到孩子和父母,并且可以轻松地更新文章的关系和主要数据(尽管您需要更新父项和子级字段),但是您仍然需要遍历它找到祖先和后代。



如果您有兴趣查找祖先和后代(并且关心这一点,而不仅仅是轻松更新关系),您可以考虑将祖先数组添加到您的第二个想法让它也容易查询祖先和后代。当然,如果你这样做,更新关系变得真的很痛苦。



结论:




  • 这一切都取决于最需要什么行动。由于您正在处理文章,其基础数据(如标题)可能会频繁更改,您可能希望避免第一个想法,因为您不仅需要更新该文章的主要文档,而且还要更新所有子文档以及父母


  • 您的第二个想法可以轻松检索直系父母和子女。更新关系也不是太困难(这肯定比其他一些可用的选项更好)。


  • 如果您真的希望以轻松更新关系为代价找到祖先和后代,请选择包含祖先参考数组。


  • 通常,尽量减少所需的遍历次数,因为它们需要运行某种迭代或递归才能获取所需的数据。如果您重视更新关系的能力,您还应该选择一个可以更改树中较少节点的选项(父参考,子引用,您的第二个想法可以执行此操作)。



Having just recently delved into the world of NoSQL with MongoDB, I am still struggling to understand the best approach to architecture without 3rd normalizing the data and then joining upon it. Currently the project I am designing is a simple collection of articles, akin to a wiki. An article will have a title and text, as well as (possibly) a parent article and one or more children articles.

I have had a number of differing ideas of for the database design and want to pick the one that best matches the strenghts of MongoDB.

Idea One

Since the most frequent type of query on the database will invariably be to simply retrieve an article, I am embedding all the relevant data that the page will need in order to display everything. The actual article of course, as well as a parent subdocument with a url (which will match the _id of some other document) as well as a title which is the text that we will print out on screen for inside of the tag. An identicle structure exists for the children except that it is an array so that all of the children are there.

{
        "_id" : "test-article-2",
        "title" : "Test Article 2",
        "text" : "Blah 2",
        "parent" : {
                "title" : "Test Article",
                "url" : "test-article"
        },
        "children" : [
                {
                        "title" : "Test Article 3",
                        "url" : "test-article-3"
                }
        ]
}

This type of design seems to have the advantage of speed (in my opinion) but I would like to hear what others thing of this design.

Idea Two

More along the lines that I am used to coming from a relational database world. would be to not embed sub-objects into the design but to simply put in their unique identifiers. Thus parent now contains just a text string which will match the _id of some other document, and children will similarly have an array of strings which link to _ids.

In order to get all of the information to view an article however, we would now need to make a number of queries (at least I think we need to...) One to get the main article, then another to get the parent's title for putting in the tag and then another to get all of the children articles and likewise get their titles.

This seems like a lot more querying just to display an article, but it may make the updating of the database easier, if for instance some article is deleted or updated. (again not sure on that point).

{
        "_id" : "test-article-2",
        "title" : "Test Article 2",
        "text" : "Blah 2",
        "parent" : "test-article",
        "children" : [ "test-article-3", "test-article-4"]
}

Would be glad to hear the input of those with more experience with MongoDB designing.

解决方案

You need to consider the type of queries you will need to perform and how frequently each type will be needed. When I was working on something similar, I came up with six possible actions:

  • Do something with the parent
  • Do something with the children
  • Do something with the ancestors (parents of parents, parents of parents of parents, etc.)
  • Do something with the descendants (children of children, children of children of children, etc.)
  • Change relationships (add/move/delete nodes in the hierarchy)
  • Change the main data in the current node (ex. changing the value in the "title" field)

You'll want to estimate how important each of these is to your application.

If most of your work involves working with stored data for some given article including its immediate parent and children, the first idea is most useful. Indeed in NongoDB it is quite common to place all the information you need in the same document rather than referencing it externally so that you only need to retrieve one thing and just work with that data. The last four actions in the list are more tricky though.

In particular, you will need to traverse through the tree to retrieve ancestors and descendants in this case, moving through intermediary documents and following a path, even though you may only care about the last document in the path. This can be slow for long hierarchies. Changing relationships can require moving a lot of information around in multiple documents because of all the data present in each one. But even changing a single field like "title" can be annoying, because you have to consider the fact that this field is present in multiple different documents, either as a main field or under the parent or children fields.

Basically, your first idea works best in more static applications where you won't be changing the data a lot after initially creating it, but where you need to read it regularly.

The MongoDB documentation has five recommended approaches for handling tree-like (hierarchical) structures. All of them have different advantages and disadvantages, though they all make it easy to update the main data in an article by only needing to do so in one document.

  • Parent References: each node contains a reference to its parent.
  • Advantages:
    • Fast parent lookup (lookup by "_id" = your doc title, return "parent" field)
    • Fast children lookup (lookup by "parent" = your doc title, which will return all child documents)
    • Updating relationships is just a matter of changing the "parent" field
    • Changing the underlying data requires changes to only one document
  • Disadvantages:
    • Searching by ancestors and descendants is slow, requiring a traversal
  • Child References: each node contains a reference array to its children
    • Advantages:
      • Fast retrieval of children (return the children array)
      • Fast relationship update (just update childrens array where needed)
    • Disadvantages:
      • Finding a parent requires looking up your _id in all children arrays of all documents until you find it (since the parent will contain the current node as a child)
      • Ancestors & descendants search require traversals of the tree
  • Array of Ancestors: each node contains a reference to an array of its ancestors & its parent
    • Advantages:
      • Fast retrieval of ancestors (no traversal required to find a specific one)
      • Easy to lookup parent and children following the "Parent References" approach
      • To find descendants, just look up the ancestors, because all descendants must contain the same ancestors
    • Disadvantages:
      • Need to worry about keeping the array of ancestors as well as the parent field updated whenever there is a change in relationships, often across multiple documents.
  • Materialized Paths: each node contains a path to itself - requires regex
    • Advantages:
      • Easy to find children and descendants using regex
      • Can use path to retrieve parent and ancestors
      • Flexibility, such as finding nodes by partial paths
    • Disadvantages:
      • Relationship changes are difficulty as they may require changes to paths across multiple documents
  • Nested Sets: Each node contains a "left" and "right" field to help find sub-trees
    • Advantages:
      • Easy to retrieve descendants in an optimal way by searching between "left" and "right"
      • Like the "Parent Reference" approach, it's easy to find parent and children
    • Disadvantages:
      • Need to traverse structure to find ancestors
      • Relationship changes perform the worst here than any other option, because every single document in the tree may need to be changed to make sure "left" and "right" still make sense once something changes in the hierarchy

The five approaches are discussed in more detail in the MongoDB documentation.

Your second idea combines the "Parent References" and "Child References" approaches discussed above. This approach makes it easy to find both the children and the parent, and makes it easy to update relationships and the main data of an article (though you need to update both the parent and the children fields), but you still need to traverse through it to find ancestors and descendants.

If you are interested in finding ancestors and descendants (and care about this more than being able to easily update relationships), you can consider adding an ancestors array to your second idea to make it also easy to query for ancestors and descendants. Of course updating relationships becomes a real pain if you do this though.

Conclusion:

  • Ultimately it all depends on what actions are needed the most. Since you're working with articles, whose underlying data (like the title) can change frequently, you may want to avoid the first idea since you would need to update not only the main document for that article but all child documents as well as the parent.

  • Your second idea makes it easy to retrieve the immediate parent and children. Updating relationships is also not too difficult (It's certainly better than some of the other options available).

  • If you really want to make it easy to find ancestors and descendants at the expense of updating relationships as easily, choose to include an array of ancestor references.

  • In general, try to minimize the number of traversals required, as they require running some kind of iteration or recursion to get to the data you want. If you value the ability to update relationships, you should also pick an option that changes fewer nodes in the tree (Parent References, Child References, and your second idea can do this).

这篇关于MongoDB - 儿童和父母结构的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆