MongoDB - 子级和父级结构 [英] MongoDB - children and parent structure
问题描述
最近刚刚使用 MongoDB 深入研究了 NoSQL 的世界,但我仍在努力理解最好的架构方法,而无需第三次规范化数据然后加入它.目前我正在设计的项目是一个简单的文章集合,类似于维基.一篇文章将有一个标题和文本,以及(可能)一篇父文章和一篇或多篇子文章.
Having just recently delved into the world of NoSQL with MongoDB, I am still struggling to understand the best approach to architecture without 3rd normalizing the data and then joining upon it. Currently the project I am designing is a simple collection of articles, akin to a wiki. An article will have a title and text, as well as (possibly) a parent article and one or more children articles.
我对数据库设计有很多不同的想法,我想选择一个最符合 MongoDB 优势的想法.
I have had a number of differing ideas of for the database design and want to pick the one that best matches the strenghts of MongoDB.
想法一
由于数据库上最常见的查询类型总是简单地检索一篇文章,因此我嵌入了页面显示所有内容所需的所有相关数据.当然是实际的文章,以及带有 url (将匹配其他文档的 _id)以及 title 的父子文档,该文本是我们将在屏幕上打印标签内部.children 存在一个相同的结构,只是它是一个数组,因此所有 children 都在那里.
Since the most frequent type of query on the database will invariably be to simply retrieve an article, I am embedding all the relevant data that the page will need in order to display everything. The actual article of course, as well as a parent subdocument with a url (which will match the _id of some other document) as well as a title which is the text that we will print out on screen for inside of the tag. An identicle structure exists for the children except that it is an array so that all of the children are there.
{
"_id" : "test-article-2",
"title" : "Test Article 2",
"text" : "Blah 2",
"parent" : {
"title" : "Test Article",
"url" : "test-article"
},
"children" : [
{
"title" : "Test Article 3",
"url" : "test-article-3"
}
]
}
这种设计似乎具有速度优势(在我看来),但我想听听其他人对这种设计的看法.
This type of design seems to have the advantage of speed (in my opinion) but I would like to hear what others thing of this design.
想法二
我习惯于来自关系数据库世界.就是不将子对象嵌入到设计中,而只是简单地放入它们的唯一标识符.因此 parent 现在只包含一个文本字符串,它将与其他文档的 _id 匹配,并且 children 将类似地具有一个字符串数组,它们链接到_ids.
More along the lines that I am used to coming from a relational database world. would be to not embed sub-objects into the design but to simply put in their unique identifiers. Thus parent now contains just a text string which will match the _id of some other document, and children will similarly have an array of strings which link to _ids.
然而,为了获取所有信息以查看一篇文章,我们现在需要进行一些查询(至少我认为我们需要......)一个获取主要文章,然后另一个获取父母的标题用于放入标签,然后另一个以获取所有子文章并同样获取他们的标题.
In order to get all of the information to view an article however, we would now need to make a number of queries (at least I think we need to...) One to get the main article, then another to get the parent's title for putting in the tag and then another to get all of the children articles and likewise get their titles.
这似乎只是为了显示一篇文章而进行更多查询,但它可能使数据库的更新更容易,例如,如果某些文章被删除或更新.(再次不确定这一点).
This seems like a lot more querying just to display an article, but it may make the updating of the database easier, if for instance some article is deleted or updated. (again not sure on that point).
{
"_id" : "test-article-2",
"title" : "Test Article 2",
"text" : "Blah 2",
"parent" : "test-article",
"children" : [ "test-article-3", "test-article-4"]
}
很高兴听到那些对 MongoDB 设计有更多经验的人的意见.
Would be glad to hear the input of those with more experience with MongoDB designing.
推荐答案
您需要考虑需要执行的查询类型以及需要每种类型的频率.当我在做类似的事情时,我想出了六种可能的行动:
You need to consider the type of queries you will need to perform and how frequently each type will be needed. When I was working on something similar, I came up with six possible actions:
- 和父母一起做点什么
- 和孩子们一起做点什么
- 与祖先(父母的父母,父母的父母等)做一些事情
- 对后代做一些事情(孩子的孩子,孩子的孩子等)
- 更改关系(在层次结构中添加/移动/删除节点)
- 更改当前节点中的主要数据(例如更改标题"字段中的值)
您需要估计每一项对您的应用程序的重要性.
You'll want to estimate how important each of these is to your application.
如果您的大部分工作都涉及处理某些给定文章的存储数据,包括其直接父级和子级,则第一个想法最有用.实际上,在 MongoDB 中,将您需要的所有信息放在同一个文档中而不是在外部引用它是很常见的,这样您只需要检索一件事并使用该数据即可.不过,列表中的最后四个操作更加棘手.
If most of your work involves working with stored data for some given article including its immediate parent and children, the first idea is most useful. Indeed in MongoDB, it is quite common to place all the information you need in the same document rather than referencing it externally so that you only need to retrieve one thing and just work with that data. The last four actions in the list are more tricky though.
特别是,在这种情况下,您需要遍历树以检索祖先和后代,在中间文档中移动并遵循路径,即使您可能只关心路径中的最后一个文档.对于长层次结构,这可能会很慢.由于每个文档中都存在所有数据,因此更改关系可能需要在多个文档中移动大量信息.但是即使更改像标题"这样的单个字段也可能很烦人,因为您必须考虑该字段存在于多个不同文档中的事实,无论是作为主字段还是在父字段或子字段下.
In particular, you will need to traverse through the tree to retrieve ancestors and descendants in this case, moving through intermediary documents and following a path, even though you may only care about the last document in the path. This can be slow for long hierarchies. Changing relationships can require moving a lot of information around in multiple documents because of all the data present in each one. But even changing a single field like "title" can be annoying, because you have to consider the fact that this field is present in multiple different documents, either as a main field or under the parent or children fields.
基本上,您的第一个想法在更多的静态应用程序中效果最佳它定期.
Basically, your first idea works best in more static applications where you won't be changing the data a lot after initially creating it, but where you need to read it regularly.
MongoDB 文档有 五种推荐的方法 用于处理树-像(分层)结构.它们都有不同的优点和缺点,尽管它们都可以很容易地更新一篇文章中的主要数据,只需在一个文档中进行更新.
The MongoDB documentation has five recommended approaches for handling tree-like (hierarchical) structures. All of them have different advantages and disadvantages, though they all make it easy to update the main data in an article by only needing to do so in one document.
- 父引用:每个节点都包含对其父节点的引用.
- 优势:
- 快速父级查找(按_id"=您的文档标题查找,返回父级"字段)
- 快速子项查找(按父项"查找 = 您的文档标题,这将返回所有子文档)
- 更新关系只是更改父"字段的问题
- 更改基础数据只需要更改一个文档
- Parent References: each node contains a reference to its parent.
- Advantages:
- Fast parent lookup (lookup by "_id" = your doc title, return "parent" field)
- Fast children lookup (lookup by "parent" = your doc title, which will return all child documents)
- Updating relationships is just a matter of changing the "parent" field
- Changing the underlying data requires changes to only one document
- 按祖先和后代搜索很慢,需要遍历
- 优势:
- 快速检索孩子(返回孩子数组)
- 快速关系更新(只需在需要的地方更新子数组)
- 查找父节点需要在所有文档的所有子数组中查找您的 _id,直到找到它(因为父节点将包含当前节点作为子节点)
- 祖先与后代搜索需要遍历树
- 优势:
- 快速检索祖先(无需遍历即可找到特定的)
- 按照父母参考"方法轻松查找父母和孩子
- 要查找后代,只需查找祖先,因为所有后代必须包含相同的祖先
- 需要担心在关系发生变化时(通常是跨多个文档)保持祖先数组和父字段的更新.
- 优势:
- 使用正则表达式轻松查找子代和后代
- 可以使用路径来检索父级和祖先
- 灵活性,例如通过部分路径查找节点
- 关系更改很困难,因为它们可能需要更改跨多个文档的路径
- 优势:
- 通过在左"和右"之间搜索,以最佳方式轻松检索后代
- 与父参考"方法一样,很容易找到父母和孩子
- 需要遍历结构才能找到祖先
- 在这里,关系更改的效果比其他任何选项都差,因为树中的每个文档都可能需要更改,以确保一旦层次结构发生更改,左"和右"仍然有意义
您的第二个想法结合了上面讨论的父引用"和子引用"方法.这种方式可以很方便的同时找到children和parent,也可以很方便的更新一篇文章的关系和主要数据(虽然需要同时更新parent和children字段),但是还是需要遍历一遍寻找祖先和后代.
Your second idea combines the "Parent References" and "Child References" approaches discussed above. This approach makes it easy to find both the children and the parent and makes it easy to update relationships and the main data of an article (though you need to update both the parent and the children fields), but you still need to traverse through it to find ancestors and descendants.
如果您对查找祖先和后代感兴趣(并且关心这一点而不是能够轻松更新关系),您可以考虑在您的第二个想法中添加祖先数组,以便查询祖先和后代.当然,如果你这样做,更新关系会变得非常痛苦.
If you are interested in finding ancestors and descendants (and care about this more than being able to easily update relationships), you can consider adding an ancestors array to your second idea to make it also easy to query for ancestors and descendants. Of course, updating relationships becomes a real pain if you do this though.
结论:
最终,这一切都取决于最需要采取的行动.由于您正在处理文章,其基础数据(如标题)可能会经常更改,因此您可能希望避免第一个想法,因为您不仅需要更新该文章的主文档,还需要更新所有子文档以及父母.
Ultimately it all depends on what actions are needed the most. Since you're working with articles, whose underlying data (like the title) can change frequently, you may want to avoid the first idea since you would need to update not only the main document for that article but all child documents as well as the parent.
您的第二个想法可以很容易地检索直接的父母和孩子.更新关系也不是太难(它肯定比其他一些可用的选项更好).
Your second idea makes it easy to retrieve the immediate parent and children. Updating relationships is also not too difficult (It's certainly better than some of the other options available).
如果您真的想以牺牲更新关系为代价轻松查找祖先和后代,请选择包含祖先引用数组.
If you really want to make it easy to find ancestors and descendants at the expense of updating relationships as easily, choose to include an array of ancestor references.
一般来说,尽量减少所需的遍历次数,因为它们需要运行某种迭代或递归才能获得所需的数据.如果您重视更新关系的能力,您还应该选择一个更改树中较少节点的选项(父引用、子引用和您的第二个想法可以做到这一点).
In general, try to minimize the number of traversals required, as they require running some kind of iteration or recursion to get to the data you want. If you value the ability to update relationships, you should also pick an option that changes fewer nodes in the tree (Parent References, Child References, and your second idea can do this).
这篇关于MongoDB - 子级和父级结构的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!