如何在 DynamoDB 中建模一对一、一对多和多对多关系 [英] How to model one-to-one, one-to-many and many-to-many relationships in DynamoDB

查看:39
本文介绍了如何在 DynamoDB 中建模一对一、一对多和多对多关系的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在 DynamoDB 中对这些关系建模的最佳方法是什么?

What is the best way to model these relationships in DynamoDB?

  • 一对一的关系
  • 一对多关系
  • 多对多关系

推荐答案

我已经看过很多次这个问题的变体,我想我会写一个问答.

I've seen variations of this question so many times I thought I would write a Q&A.

在阅读本文之前,您应该了解:

Before reading this you should understand:

  • 每个 DynamoDB 表都有一个唯一的主键
  • 主键必须由<组成强>分区键,并且可以有一个可选的排序键.同时具有分区键和排序键的主键是复合键.
  • GetItem 请求使用其唯一主键返回一项且仅一项.
  • 查询快速查找并且必须指定一个且仅一个分区键.它可以返回多个项目.
  • 扫描评估表中的每个项目,并可能根据过滤器参数返回一个子集.在某些情况下,扫描是正确的选择,但如果使用不当,可能会很慢且代价高昂.
  • 全球二级索引(GSI) 具有与基表不同的分区键.把它想象成有两个保持同步的表(基表和 GSI).根据使用情况,GSI 可能会使您的基本表的成本增加一倍.
  • 本地二级索引(LSI) 具有与基表相同的分区键,但具有不同的排序键.将其视为一种对基表数据进行排序的不同方式,但仅限于在分区键内.LSI 不会花费您任何费用.

我们可以模拟护照和人来证明这种关系.一本护照只能有一个主人,一个人只能有一本护照.

We can model passports and people to demonstrate this relationship. One passport can only have one owner, and one person can only have one passport.

方法很简单.我们有两张表,其中一张应该有外键.

The approach is very simple. We have two tables, and one of those tables should have a foreign key.

护照表:

分区键:PassportId

╔════════════╦═══════╦════════════╗
║ PassportId ║ Pages ║   Issued   ║
╠════════════╬═══════╬════════════╣
║ P1         ║    15 ║ 11/03/2009 ║
║ P2         ║    18 ║ 09/02/2018 ║
╚════════════╩═══════╩════════════╝

护照持有人表:

分区键:PersonId

╔══════════╦════════════╦══════╗
║ PersonId ║ PassportId ║ Name ║
╠══════════╬════════════╬══════╣
║ 123      ║ P1         ║ Jane ║
║ 234      ║ P2         ║ Paul ║
╚══════════╩════════════╩══════╝

请注意,PersonId 没有出现在护照表中.如果我们这样做,我们将有两个具有相同信息的地方(哪些护照属于哪个人).如果表格不同意谁拥有哪个护照,这将导致额外的数据更新和潜在的一些数据质量问题.

Notice that PersonId did not appear in the passport table. If we did that, we would have two places with the same information (which passports belong to which person). This would lead to extra data updates and potentially some data quality issues if the tables did not agree on who owns which passport.

但是,我们缺少一个用例.我们可以很容易地通过 PersonId 查找一个人,并找到他们持有的护照.但是如果我们有一个 PassportId 并且我们需要找到谁拥有它呢?在当前模型中,我们需要执行扫描在护照持有人桌上.如果这是一个常规用例,我们就不想使用 Scan.为了支持 GetItem,我们可以简单地添加一个 GSI 到护照持有人表:

However, we are missing a use case. We can easily look a person up by their PersonId, and find which passport they have. But what if we have a PassportId and we need to find who owns it? In the current model we would need to perform a Scan on the Passport holder table. If this is a regular use case, we wouldn't want to use a Scan. To support a GetItem we can simply add a GSI to the Passport holder table:

护照持有人表 GSI:

分区键:PassportId

╔════════════╦══════════╦══════╗
║ PassportId ║ PersonId ║ Name ║
╠════════════╬══════════╬══════╣
║ P1         ║ 123      ║ Jane ║
║ P2         ║ 234      ║ Paul ║
╚════════════╩══════════╩══════╝

现在我们可以非常快速且经济地使用 PassportId 或 PersonId 查找关系.

Now we can look up relationships using PassportId or PersonId very quickly and cheaply.

还有其他选项可以对此进行建模.例如,您可以有一个没有外键的普通" Passport 表和 Person 表,然后有一个简单的将 PassortIds 和 PersonIds 映射在一起的第三个辅助表.在这种情况下,我认为这不是最干净的设计,但如果您喜欢它,这种方法没有任何问题.请注意,它们是多对多关系部分中辅助关系表的示例.

There are other options for modelling this. For example you could have a 'plain' Passport table and Person table with no foreign keys, then have a third auxiliary table that simple maps PassortIds and PersonIds together. I don't think that's cleanest design in this case, but if you prefer it, there is nothing wrong with that approach. Note that their is an example of an auxiliary relationship table in the many-to-many relationship section.

我们可以模拟宠物和主人来展示这种关系.宠物只能有一个主人,但主人可以拥有多只宠物.

We can model pets and owners to demonstrate this relationship. Pets can only have one owner, but owners can have many pets.

该模型看起来与一对一模型非常相似,因此我将只关注这种差异.

The model looks very similar to the one-to-one model, so I will just focus on this differences.

宠物桌:

分区键:PetId

╔═══════╦═════════╦════════╗
║ PetId ║ OwnerId ║ Type   ║
╠═══════╬═════════╬════════╣
║ P1    ║ O1      ║ Dog    ║
║ P2    ║ O1      ║ Cat    ║
║ P3    ║ O2      ║ Rabbit ║
╚═══════╩═════════╩════════╝

所有者表:

分区键:OwnerId

╔═════════╦════════╗
║ OwnerId ║ Name   ║
╠═════════╬════════╣
║ O1      ║ Angela ║
║ O2      ║ David  ║
╚═════════╩════════╝

我们将外键放在 many 表中.如果我们反过来做,将 PetId 放在 Owner 表中,那么一个 Owner Item 就必须有一组 PetId,这会变得难以管理.

We put the foreign key in the many table. If we did it the other way around, and put PetIds in the Owner table, one Owner Item would have to have a set of PetIds, and that would get complicated to manage.

如果我们想找出宠物的主人,很容易.我们可以做一个 GetItem 来返回宠物物品,它告诉我们主人是谁.但反过来更难——如果我们有一个 OwnerId,他们拥有哪些宠物?为了节省我们必须在 Pet 表上进行扫描,我们改为将 GSI 添加到 Pet 表中.

If we want to find out the Owner for a Pet, its very easy. We can do a GetItem to return the Pet Item, and it tells us who the owner is. But the other way around is harder - if we have an OwnerId, which Pets do they own? To save us have to do a Scan on the Pet table, we instead add a GSI to the Pet table.

宠物桌 GSI

分区键:OwnerId

╔═════════╦═══════╦════════╗
║ OwnerId ║ PetId ║ Type   ║
╠═════════╬═══════╬════════╣
║ O1      ║ P1    ║ Dog    ║
║ O1      ║ P2    ║ Cat    ║
║ O2      ║ P3    ║ Rabbit ║
╚═════════╩═══════╩════════╝

如果我们有一个 OwnerId 并且我们想找到他们的宠物,我们可以执行一个 查询.例如,对所有者 O1 的查询将返回具有 PetId P1 和 P2 的项目.

If we have an OwnerId and we want to find their Pets, we can perform a Query on the Pet table GSI. For example a Query on Owner O1 will return the items with PetId P1 and P2.

您可能会注意到这里有一些有趣的事情.表的主键必须是唯一的.这仅适用于基表.GSI 主键,在本例中只是 GSI 分区键,没有独一无二.

You may notice something interesting here. A primary key must be unique for a table. This is true only for the base table. A GSI primary key, in this case just the GSI partition key, does not have to be unique.

在 DynamoDB 表中,每个键值必须是唯一的.然而,关键全局二级索引中的值不需要是唯一的

In a DynamoDB table, each key value must be unique. However, the key values in a global secondary index do not need to be unique

顺便说一句,GSI 不需要项目 所有与基表相同的属性.如果您仅将 GSI 用于查找,您可能希望仅投影 GSI 关键属性.

On a side note, a GSI does not need to project all of the same attributes as the base table. If you are using the GSI for lookups only, you may wish to only project the GSI key attributes.

在 DynamoDB 中建模多对多关系主要有三种方式.每个人都有优点和缺点.

There are three main ways to model a many-to-many relationship in DynamoDB. Each have strengths and weaknesses.

我们可以使用医生和患者的例子来模拟这种关系.一个医生可以有很多病人,一个病人可以有很多医生.

We can use the example of Doctors and Patients to model this relationship. A Doctor can have many patients and a patient can have many Doctors.

一般来说,这是我的首选方法,这就是它首先使用的原因.这个想法是创建没有关系引用的普通"基表.关系引用然后进入辅助表(每个关系类型一个辅助表 - 在这种情况下只是医生 - 患者).

Generally, this is my preferred approach, which is why it goes first. The idea is to create 'plain' base tables with no relationship references. The relationship references then go in auxiliary tables (one auxiliary table per relationship type - in this case just Doctors-Patients).

医生表:

分区键:DoctorId

╔══════════╦═══════╗
║ DoctorId ║ Name  ║
╠══════════╬═══════╣
║ D1       ║ Anita ║
║ D2       ║ Mary  ║
║ D3       ║ Paul  ║
╚══════════╩═══════╝

病床

分区键:PatientId

╔═══════════╦═════════╦════════════╗
║ PatientId ║ Name    ║ Illness    ║
╠═══════════╬═════════╬════════════╣
║ P1        ║ Barry   ║ Headache   ║
║ P2        ║ Cathryn ║ Itchy eyes ║
║ P3        ║ Zoe     ║ Munchausen ║
╚═══════════╩═════════╩════════════╝

DoctorPatient 表(辅助表)

分区键:DoctorId

排序键:PatientId

╔══════════╦═══════════╦══════════════╗
║ DoctorId ║ PatientId ║ Last Meeting ║
╠══════════╬═══════════╬══════════════╣
║ D1       ║ P1        ║ 01/01/2018   ║
║ D1       ║ P2        ║ 02/01/2018   ║
║ D2       ║ P2        ║ 03/01/2018   ║
║ D2       ║ P3        ║ 04/01/2018   ║
║ D3       ║ P3        ║ 05/01/2018   ║
╚══════════╩═══════════╩══════════════╝

DoctorPatient 表 GSI

分区键:PatientId

排序键:DoctorId

╔═══════════╦══════════╦══════════════╗
║ PatientId ║ DoctorId ║ Last Meeting ║
╠═══════════╬══════════╬══════════════╣
║ P1        ║ D1       ║ 01/01/2018   ║
║ P2        ║ D1       ║ 02/01/2018   ║
║ P2        ║ D2       ║ 03/01/2018   ║
║ P3        ║ D2       ║ 04/01/2018   ║
║ P3        ║ D3       ║ 05/01/2018   ║
╚═══════════╩══════════╩══════════════╝

共有三张表,DoctorPatient 辅助表是最有趣的一张.

There are three tables, the DoctorPatient auxiliary table is the interesting one.

DoctorPatient 基表的主键必须是唯一的,所以我们创建了 DoctorId(分区键)和 PatientId(排序键)的复合键.

The DoctorPatient base table primary key must be unique, so we create a composite key of the DoctorId (partition key) and PatientId (sort key).

我们可以对 DoctorPatient 基表执行查询使用 DoctorId 获取 Doctor 拥有的所有患者.

We can perform a Query on the DoctorPatient base table using DoctorId to get all patients that a Doctor has.

我们可以使用在 DoctorPatient GSI 上执行查询PatientId 获取与患者关联的所有医生.

We can perform a Query on the DoctorPatient GSI using PatientId to get all of the Doctors associated with a Patient.

这种方法的优点是表的清晰分离,以及将简单的业务对象直接映射到数据库的能力.它不需要使用更高级的功能,例如集合.

The strengths of this approach are a clean separation of tables, and the ability to map simple business objects directly to the database. It requires no use of more advanced features such as sets.

需要协调一些更新,比如删除一个Patient,删除DoctorPatient表中的关系也要小心.然而,与其他一些方法相比,引入数据质量问题的可能性较低.

It is necessary to co-ordinate some updates, for example if you delete a Patient, you also need to be careful to delete the relationships in the DoctorPatient table. However the chance of introducing data quality issues is low compared to some other approaches.

DynamoDB 现在支持交易,允许您将多个更新协调为跨多个表的单个原子事务.

DynamoDB now supports Transactions, allowing you to co-ordinate multiple updates into a single atomic transaction across multiple tables.

这种方法的一个潜在弱点是它需要 3 个表.如果您使用吞吐量来配置表,则表越多,您必须越少地扩展容量.然而,有了新的按需功能,这不是问题.

A potential weakness of this approach is that it requires 3 tables. If you are provisioning tables with throughput, the more tables there are, the thinner you have to spread your capacity. However with the new on-demand feature, this is not a concern.

这种方法只使用两个表.

This approach uses just two tables.

医生表:

分区键:DoctorId

╔══════════╦════════════╦═══════╗
║ DoctorId ║ PatientIds ║ Name  ║
╠══════════╬════════════╬═══════╣
║ D1       ║ P1,P2      ║ Anita ║
║ D2       ║ P2,P3      ║ Mary  ║
║ D3       ║ P3         ║ Paul  ║
╚══════════╩════════════╩═══════╝

患者表:

分区键:PatientId

╔═══════════╦══════════╦═════════╗
║ PatientId ║ DoctorIds║  Name   ║
╠═══════════╬══════════╬═════════╣
║ P1        ║ D1       ║ Barry   ║
║ P2        ║ D1,D2    ║ Cathryn ║
║ P3        ║ D2,D3    ║ Zoe     ║
╚═══════════╩══════════╩═════════╝

这种方法涉及将关系作为一个集合存储在每个表中.

This approach involves storing relationships as a set in each table.

要为医生查找患者,我们可以使用医生表上的 GetItem 来检索医生项目.然后将 PatientIds 作为一组存储在 Doctor 属性中.

To find the Patients for a Doctor, we can use GetItem on the Doctor table to retrieve the Doctor item. Then the PatientIds are stored as a set in a Doctor attribute.

要查找某个患者的医生,我们可以使用 Patient 表上的 GetItem 来检索 Patient 项目.然后 DoctorIds 作为一个集合存储在一个 Patient 属性中.

To find the Doctors for a Patient, we can use GetItem on the Patient table to retrieve the Patient item. Then the DoctorIds are stored as a set in a Patient attribute.

这种方法的优势在于业务对象和数据库表之间存在直接映射.只有两个表,因此如果您使用的是预配吞吐能力,则不需要将其分散得太细.

The strength of this approach is that there is a direct mapping between business objects and database tables. There are only two tables so if you are using provision throughput capacity, it doesn't need to be spread too thinly.

这种方法的主要缺点是可能出现数据质量问题.如果将患者链接到医生,则需要协调两个更新,每个表一个.如果一次更新失败会怎样?您的数据可能会不同步.

The major downside to this approach is the potential for data quality issues. If you link a Patient to a Doctor, you have you co-ordinate two updates, one to each table. What happens if one update fails? You data can get out of sync.

另一个缺点是在两个表中都使用了集合.DynamoDB SDK 旨在处理 Sets,但当涉及 Sets 时,某些操作可能会很复杂.

Another downside is the use of Sets in both tables. The DynamoDB SDKs are designed to handle Sets, but certain operations can be complicated when Sets are involved.

AWS 之前将此称为 邻接列表模式.它通常被称为图形数据库Triple Store.

AWS have previously referred to this as the Adjacency List pattern. It is more commonly referred to as a Graph database or a Triple Store.

我之前在 AWS Adjancey 列表模式中回答了这个问题似乎帮助了一些人理解它.

I have previously answered this question on the AWS Adjancey List Pattern which seems to have helped some people understand it.

AWS 最近的一次演讲中对这种模式进行了大量讨论此处

And there is a recent presentation by AWS that talks a lot about this pattern here

该方法涉及将所有数据放在一张表中.

The approach involves putting all of the data in just one table.

我只是绘制了一些示例行而不是整个表格:

I've just drawn some example rows rather than the whole table:

分区键:Key1

排序键:Key2

╔═════════╦═════════╦═══════╦═════════════╦══════════════╗
║ Key1    ║ Key2    ║ Name  ║   illness   ║ Last Meeting ║
╠═════════╬═════════╬═══════╬═════════════╬══════════════╣
║ P1      ║ P1      ║ Barry ║ Headache    ║              ║
║ D1      ║ D1      ║ Anita ║             ║              ║
║ D1      ║ P1      ║       ║             ║ 01/01/2018   ║
╚═════════╩═════════╩═══════╩═════════════╩══════════════╝

然后需要一个 GSI 来反转密钥:

And then a GSI is required that inverts the keys:

分区键:Key2

排序键:Key1

╔═════════╦═════════╦═══════╦═════════════╦══════════════╗
║ Key2    ║ Key1    ║ Name  ║   illness   ║ Last Meeting ║
╠═════════╬═════════╬═══════╬═════════════╬══════════════╣
║ P1      ║ P1      ║ Barry ║ Headache    ║              ║
║ D1      ║ D1      ║ Anita ║             ║              ║
║ P1      ║ D1      ║       ║             ║ 01/01/2018   ║
╚═════════╩═════════╩═══════╩═════════════╩══════════════╝

这个模型在某些特定情况下有一些优势——它可以在高度连接的数据中表现良好.如果你格式化你的数据,你可以实现非常快速和可扩展的模型.它的灵活性在于您可以在表中存储任何实体或关系,而无需更新架构/表.如果您要配置吞吐量容量,它会很高效,因为所有吞吐量都可用于整个应用程序的任何操作.

This model has some strengths in some specific circumstances - it can perform well in highly connected data. If you format your data well, you can achieve extremely fast and scalable models. It is flexible in that you can store any entity or relationship in the table without updating your schema/tables. If you are provisioning throughput capacity it can be efficient as all of the throughput is available to any operation across the application.

如果使用不当或没有认真考虑,此模型会存在一些巨大的缺点.

This model suffers from some huge downsides if used incorrectly or without serious consideration.

您丢失了业务对象和表之间的任何直接映射.这几乎总是导致不可读的意大利面条式代码.执行即使是简单的查询也会感觉非常复杂.由于代码和数据库之间没有明显的映射关系,因此管理数据质量变得很困难.我见过的大多数使用这种方法的项目最终都会编写各种实用程序,其中一些实用程序本身就变成了产品,只是为了管理数据库.

You lose any direct mapping between your business objects and your tables. This almost always results in unreadable spaghetti code. Performing even simple queries can feel very complex. Managing data quality becomes difficult as there is no obvious mapping between the code and the database. Most projects I've seen that use this approach end up writing various utilities, some of which become products in their own right, just to manage the database.

另一个小问题是模型中每个项目的每个属性都必须存在于一个表中.这通常会导致一个包含数百列的表.这本身不是问题,但是尝试处理包含这么多列的表通常会引发简单的问题,例如难以查看数据.

Another minor problem is that every attribute for every item in your model has to exist on one table. This usually results in a table that has hundreds of columns. In itself this is not a problem, but trying to work on a table with that many columns usually throws up simple problems like difficulty in viewing the data.

简而言之,我认为 AWS 可能在一组文章中发布了本应是有用的文章,但是由于没有介绍其他(更简单的)管理多对多关系的概念,他们让很多人感到困惑.所以要明确的是,邻接列表模式可能很有用,但它不是在 DynamoDB 中建模多对多关系的唯一选择.如果它适用于您的情况,例如严重的大数据,请务必使用它,但如果不能,请尝试使用更简单的模型之一.

In short I think AWS have probably released what should have been a useful article in a set of articles, but by failing to introduce other (simpler) concepts for managing many-to-many relationships, they have confused lots of people. So to be clear, the adjacency list pattern can be useful, but its not the only option for modelling many-to-many relationships in DynamoDB. By all means use it if it works for your circumstances such as seriously Big Data, but if not, try one of the simpler models.

这篇关于如何在 DynamoDB 中建模一对一、一对多和多对多关系的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆