如何在Azure Cosmos DB中构建关系? [英] How to structure relationships in Azure Cosmos DB?

查看:58
本文介绍了如何在Azure Cosmos DB中构建关系?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在cosmos的同一集合中有两组数据,一组是帖子",另一组是用户",它们由用户创建的帖子链接.

I have two sets of data in the same collection in cosmos, one are 'posts' and the other are 'users', they are linked by the posts users create.

目前我的结构如下;

// user document
{
id: 123,
postIds: ['id1','id2']
}

// post document
{
id: 'id1',
ownerId: 123
}
{
id: 'id2',
ownerId: 123
}

此设置的主要问题是它的可替代性,代码必须强制执行链接,如果存在错误,很容易丢失数据,而没有明确的恢复方法.

My main issue with this setup is the fungible nature of it, code has to enforce the link and if there's a bug data will very easily be lost with no clear way to recover it.

我还担心性能,如果用户有10,000个帖子(即10,000个查询),我将必须解决所有帖子.

I'm also concerned about performance, if a user has 10,000 posts that's 10,000 lookups I'll have to do to resolve all the posts..

这是建模实体关系的正确方法吗?

Is this the correct method for modelling entity relationships?

推荐答案

正如David所说,这是一个漫长的讨论,但这是一个非常普遍的讨论,因此,因为我有大约一个小时的空闲"时间,所以我非常希望能一次回答这个问题.

As said by David, it's a long discussion but it is a very common one so, since I have on hour or so of "free" time, I'm more than glad to try to answer it, once for all, hopefully.

为什么要标准化?

我在您的帖子中注意到的第一件事:您正在寻找某种级别的参照完整性( https://en.wikipedia.org/wiki/Referential_integrity ),这是将较大的对象分解成其组成部分时所需要的.也称为归一化.

First thing I notice in your post: you are looking for some level of referential integrity (https://en.wikipedia.org/wiki/Referential_integrity) which is something that is needed when you decompose a bigger object into its constituent pieces. Also called normalization.

虽然通常在关系数据库中完成此操作,但现在它在非关系数据库中也变得很流行,因为它有助于避免数据重复,这通常会带来比其解决的问题更多的问题.

While this is normally done in a relational database, it is now also becoming popular in non-relational database since it helps a lot to avoid data duplication which usually creates more problem than what it solves.

https://docs.mongodb.com/manual/core/data-model-design/#normalized-data-models

但是您真的需要吗?由于您已选择使用JSON文档数据库,因此您应该利用以下事实:它可以存储整个文档,然后仅将文档与所有所有者数据(姓名,姓氏或您拥有的有关用户的所有其他数据)一起存储创建文档的人.是的,我是说您可能想评估一下没有帖子和用户,而只是在其中包含用户信息的帖子.这实际上可能是非常正确的,因为您将确保获得现有用户的精确数据在创建帖子时.举例来说,我创建了一个帖子,并且拥有传记"X".然后,我将自己的传记更新为"Y"并创建一个新帖子.这两篇文章将有不同的作者传记,而这恰恰是正确的,因为他们已经完全抓住了现实.

But do you really need it? Since you have chosen to use JSON document database, you should leverage the fact that it's able to store the entire document and then just store the document ALONG WITH all the owner data: name, surname, or all the other data you have about the user who created the document. Yes, I’m saying that you may want to evaluate not to have post and user, but just posts, with user info inside it.This may be actually very correct, as you will be sure to get the EXACT data for the user existing at the moment of post creation. Say for example I create a post and I have biography "X". I then update my biography to "Y" and create a new post. The two post will have different author biographies and this is just right, as they have exactly captured reality.

当然,您可能还想在作者页面上显示传记.在这种情况下,您会遇到问题.您将使用哪一个?可能是最后一个.

Of course you may want to also display a biography in an author page. In this case you'll have a problem. Which one you'll use? Probably the last one.

如果所有作者(为了在您的系统中存在)都必须发布博客,则可能就足够了.但是也许您想让一位作者写它的传记并在您的系统中列出,甚至在他写博客帖子之前.

If all authors, in order to exist in your system, MUST have blog post published, that may well be enough. But maybe you want to have an author write its biography and being listed in your system, even before he writes a blog post.

在这种情况下,您需要对模型进行归一化并创建一个新的文档类型,仅适用于作者.如果是这种情况,那么,您还需要弄清楚如何处理上述情况.当作者更新自己的传记时,您将只是更新作者文档,还是创建一个新的文档?如果您创建了一个新文章,以便可以跟踪所有更改,是否还会更新所有以前的文章,以便他们引用或不引用新文档?

In such case you need to NORMALIZE the model and create a new document type, just for authors. If this is your case, then, you also need to figure out how to handler the situation described before. When the author will update its own biography, will you just update the author document, or create a new one? If you create a new one, so that you can keep track of all changes, will you also update all the previous post so that they will reference the new document, or not?

如您所见,答案很复杂,并且实际上取决于您要从现实世界中捕获什么样的信息.

As you can see the answer is complex, and REALLY depends on what kind of information you want to capture from the real world.

因此,首先,确定您是否真的需要将帖子和用户分开.

So, first of all, figure out if you really need to keep posts and users separated.

一致性

让我们假设您确实希望将帖子和用户保存在单独的文档中,因此可以对模型进行规范化.在这种情况下,请记住Cosmos DB(但通常是NoSQL)数据库不提供任何形式的本机支持来实现引用完整性,因此您几乎完全可以自己做.当然,索引可以提供帮助,因此您可能希望为ownerId属性建立索引,例如,在删除作者之前,您可以有效地检查他/她所做的任何博客文章是否将其保留为孤儿. 另一个选择是手动创建并保持更新的ANOTHER文档,该文档对于每个作者而言都可以跟踪他/她撰写的博客文章.使用这种方法,您只需查看本文档即可了解哪些博客帖子属于作者.您可以尝试使用触发器使此文档自动更新,或者在您的应用程序中进行更新. 请记住,当您在NoSQL数据库中进行标准化时,保持数据一致是您的责任.这与关系数据库恰好相反,在关系数据库中,您的责任是在对数据进行非规范化时保持数据的一致性.

Let’s assume that you really want to have posts and users kept in separate documents, and thus you normalize your model. In this case, keep in mind that Cosmos DB (but NoSQL in general) databases DO NOT OFFER any kind of native support to enforce referential integrity, so you are pretty much on your own. Indexes can help, of course, so you may want to index the ownerId property, so that before deleting an author, for example, you can efficiently check if there are any blog post done by him/her that will remain orphans otherwise. Another option is to manually create and keep updated ANOTHER document that, for each author, keeps track of the blog posts he/she has written. With this approach you can just look at this document to understand which blog posts belong to an author. You can try to keep this document automatically updated using triggers, or do it in your application. Just keep in mind, that when you normalize, in a NoSQL database, keep data consistent is YOUR responsibility. This is exactly the opposite of a relational database, where your responsibility is to keep data consistent when you de-normalize it.

性能

性能可能是一个问题,但是通常您并不首先为了支持性能而建模.为了确保您的模型可以代表并存储现实世界中所需的信息,您可以进行建模,然后对其进行优化以使您选择使用的数据库具有良好的性能.由于不同的数据库将具有不同的约束,因此该模型将适用于应对该约束.这与老式的逻辑"与物理"建模的良好讨论无异.

Performance COULD be an issue, but you don't usually model in order to support performances in first place. You model in order to make sure your model can represent and store the information you need from the real world and then you optimize it in order to have decent performance with the database you have chose to use. As different database will have different constraints, the model will then be adapted to deal with that constraints. This is nothing more and nothing less that the good old "logical" vs "physical" modeling discussion.

在Cosmos DB的情况下,您不应该使用跨分区的查询,因为它们更昂贵.

In Cosmos DB case, you should not have queries that go cross-partition as they are more expensive.

不幸的是,分区是您一劳永逸地选择的东西,因此您确实需要清楚地知道您最希望支持的最常见用例是什么.如果您的大部分查询都是按作者进行的,那么我将按作者进行分区.

Unfortunately partitioning is something you chose once and for all, so you really need to have clear in your mind what are the most common use case you want to support at best. If the majority of your queries are done on per author basis, I would partition per author.

现在,尽管这似乎是一个明智的选择,但只有当您有很多作者时,这才是明智的选择.例如,如果只有一个分区,则所有数据和查询将仅进入一个分区,从而极大地限制了您的性能.请记住,事实上,Cosmos DB RU被划分为所有可用分区:例如,使用10.000 RU,您通常会获得5个分区,这意味着所有值都将分布在5个分区中.每个分区的最高限制为2000 RU.如果所有查询仅使用一个分区,则实际的最大性能是2000而不是10000 RU.

Now, while this may seems a clever choice, it will be only if you have A LOT of authors. If you have only one, for example, all data and queries will go into just one partition, limiting A LOT your performance. Remember, in fact, that Cosmos DB RU are split among all the available partitions: with 10.000 RU, for example, you usually get 5 partitions, which means that all your values will be spread across 5 partitions. Each partition will have a top limit of 2000 RU. If all your queries use just one partition, your real maximum performance is that 2000 and not 10000 RUs.

我真的希望这可以帮助您开始找出答案.我真的希望这有助于促进和发展我认为现在确实应该成熟的讨论(如何为文档数据库建模).

I really hope this help you to start to figure out the answer. And I really hope this help to foster and grow a discussion (how to model for a document database) that I think it is really due and mature now.

这篇关于如何在Azure Cosmos DB中构建关系?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆