DynamoDB 邻接列表是否应该使用离散的分区键来为每种类型的关系建模? [英] Should DynamoDB adjacency lists use discrete partition keys to model each type of relationship?

查看:15
本文介绍了DynamoDB 邻接列表是否应该使用离散的分区键来为每种类型的关系建模?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在建立一个论坛并研究使用 DynamoDB 和邻接列表对数据进行建模.一些顶级实体(如用户)可能与其他顶级实体(如评论)具有多种类型的关系.

I am building a forum and investigating modeling the data with DynamoDB and adjacency lists. Some top-level entities (like users) might have multiple types of relationships with other top-level entities (like comments).

例如,假设我们希望能够执行以下操作:

For example, let's say we want be able to do the following:

  • 用户可以点赞评论
  • 用户可以关注评论
  • 评论可以显示喜欢它的用户
  • 评论可以显示关注它的用户
  • 用户个人资料可以显示他们喜欢的评论
  • 用户个人资料可以显示他们关注的评论

因此,我们本质上是多对多(用户 <=> 评论)对多(喜欢或关注).

So, we essentially have a many-to-many (user <=> comment) to many (like or follow).

注意:这个例子是故意精简的,在实践中会有更多的关系来建模,所以我想在这里考虑一些可扩展的东西.

以下顶级数据可能在任何邻接表表示中都很常见:

The following top-level data would likely be common in any adjacency list representation:

First_id(Partition key)         Second_id(Sort Key)         Data
-------------                   ----------                  ------
User-Harry                      User-Harry                  User data
User-Ron                        User-Ron                    User data
User-Hermione                   User-Hermione               User data
Comment-A                       Comment-A                   Comment data
Comment-B                       Comment-B                   Comment data
Comment-C                       Comment-C                   Comment data

此外,对于下面的每个表,将有一个等效的全局二级索引,其中交换了分区键和排序键.

Furthermore, for each table below, there would be an equivalent Global Secondary Index with the partition and sort keys swapped.

这是我想在 DynamoDB 中建模的内容:

This is what I would like to model in DynamoDB:

  1. 哈利喜欢评论 A
  2. Harry 喜欢评论 B
  3. Harry 关注评论 A
  4. 罗恩喜欢评论 B
  5. 赫敏喜欢评论 C

选项 1

使用第三个属性来定义关系类型:

Option 1

Use a third attribute to define the type of relationship:

First_id(Partition key)         Second_id(Sort Key)         Data
-------------                   ----------                  ------
Comment-A                       User-Harry                  "LIKES"
Comment-B                       User-Harry                  "LIKES"
Comment-A                       User-Harry                  "FOLLOWS"
Comment-B                       User-Ron                    "LIKES"
Comment-C                       User-Hermione               "FOLLOWS"

这种方法的缺点是查询结果中存在冗余信息,因为它们会返回您可能不关心的额外项目.例如,如果您想查询所有喜欢给定评论的用户,您还必须处理所有关注给定评论的用户.同样,如果要查询用户点赞的所有评论,则需要处理用户关注的所有评论.

The downside to this approach is that there is redundant information in query results, because they will return extra items you maybe don't care about. For example, if you want to query all the users that like a given comment, you're also going to have to process all the users that follow a that given comment. Likewise, if you want to query all the comments that a user likes, you need to process all the comments that a user follows.

修改键来表示关系:

First_id(Partition key)         Second_id(Sort Key)
-------------                   ----------
LikeComment-A                   LikeUser-Harry
LikeComment-B                   LikeUser-Harry
FollowComment-A                 FollowUser-Harry
LikeComment-B                   LikeUser-Ron
FollowComment-C                 FollowUser-Hermione

这使得独立查询变得高效:

This makes it efficient to query independently:

  1. 评论点赞
  2. 评论如下
  3. 用户喜欢
  4. 用户关注

缺点是同一个顶级实体现在有多个键,随着更多关系的添加,这可能会使事情变得复杂.

The downside is that the same top-level entity now has multiple keys, which might make things complex as more relationships are added.

完全跳过邻接列表并使用单独的表,可能一张用于Users,一张用于Likes,一张用于Follows.

Skip adjacency lists altogether and use separate tables, maybe one for Users, one for Likes, and one for Follows.

传统的关系数据库.虽然我不打算走这条路,因为这是一个个人项目,我想探索 DynamoDB,但如果这是正确的思考方式,我很想听听为什么.

Traditional relational database. While I'm not planning on going this route because this is a personal project and I want to explore DynamoDB, if this is the right way to think about things, I'd love to hear why.

感谢您阅读本文!如果我可以做些什么来简化问题或澄清任何事情,请告诉我:)

Thanks for reading this far! If there is anything I can do to simplify the question or clarify anything, please let me know :)

我查看了 AWS 最佳实践 和这个 多对多SO post 似乎都没有解决多对多(与多)关系,因此非常感谢任何资源或指导.

I've looked at the AWS best practices and this many-to-many SO post and neither appears to address the many-to-many (with many) relationship, so any resources or guidance greatly appreciated.

推荐答案

您的选项 1 是不可能的,因为它没有唯一的主键.在您的示例数据中,您可以看到 (Comment-A, User-Harry) 有两个条目.

Your Option 1 is not possible because it does not have unique primary keys. In your sample data, you can see that you have two entries for (Comment-A, User-Harry).

解决方案 1

实现您正在寻找的方法是为您的表和 GSI 使用稍微不同的属性.如果 Harry 喜欢 Comment A,那么你的属性应该是:

The way to implement what you are looking for is by using slightly different attributes for your table and the GSI. If Harry likes Comment A, then your attributes should be:

hash_key: User-Harry
gsi_hash_key: Comment-A
sort_key_for_both: Likes-User-Harry-Comment-A

现在,对于表和 GSI 中的顶级实体,您只有一个分区键值,您可以使用 begins_with 运算符查询特定的关系类型.

Now you have only one partition key value for your top level entities in both the table and the GSI, and you can query for a specific relationship type by using the begins_with operator.

解决方案 2

您可以使关系成为顶级实体.例如,Likes-User-Harry-Comment-A 将在数据库中有两个条目,因为它与 User-HarryComment A 相邻".

You could make the relationship a top-level entity. For example, Likes-User-Harry-Comment-A would have two entries in the database because it is "adjacent to" both User-Harry and Comment A.

如果您想对未来关系的更复杂信息进行建模(包括描述关系之间关系的能力,例如 Likes-User-Ron-User-Harry原因 Follows-User-Ron-User-Harry).

This allows you flexibility if you want to model more complex information about the relationships in the future (including the ability to describe the relationship between relationships, such as Likes-User-Ron-User-Harry Causes Follows-User-Ron-User-Harry).

但是,这种策略需要在数据库中存储更多的项目,这意味着保存喜欢"(以便可以查询)不是原子操作.(但您可以通过仅编写关系实体来解决此问题,然后使用 DynamoDBStreams + Lambda 为我在此解决方案开头提到的两个条目编写条目.)

However, this strategy requires more items to be stored in the database, and it means that saving a "like" (so that it can be queried) is not an atomic operation. (But you can work around that by only writing the relationship entity, and then use DynamoDBStreams + Lambda to write entries for two entries I mentioned at the beginning of this solution.)

更新:使用 DynamoDB 事务,以这种方式保存喜欢"实际上可以是一个完全的 ACID 操作.

这篇关于DynamoDB 邻接列表是否应该使用离散的分区键来为每种类型的关系建模?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆