如何处理 MongoDB/DynamoDB 中的循环文档? [英] How to handle circular documents in MongoDB/DynamoDB?

查看:29
本文介绍了如何处理 MongoDB/DynamoDB 中的循环文档?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

目前该站点使用的是关系数据库 (MySQL),但是连接所有数据的速度太长,并且需要缓存,这会导致其他问题.

Currently the site is using a relational database (MySQL) however the speed to join all the data is too long and has required caching that has lead to other issues.

问题是这两个表如何相互嵌套以创建循环引用.一个简单的例子是两张表,一张用于 ACTOR,另一张用于 MOVIE.电影会有演员,演员也会有电影.显然,这在关系数据库中很容易.

The issue is how the two tables would nest into each other creating a circular reference. A simple example would be two tables, one for an ACTOR and a second for a MOVIE. The movie would have the actor and the actor would have a movie. Obviously this is easy in a relational database.

例如,一个 ACTOR 模式:

So for example, an ACTOR schema:

ACTOR1
- AGE
- BIO
- MOVIES
    - FILM1 (ties to the FILM1 document)
    - FILM2

然后是 MOVIE 模式:

Then the MOVIE schema:

FILM1
- RELEASE DATE
- ACTORS
    - ACTOR1 (ties back to the ACTOR document)
    - ACTOR2

速度对我来说是最重要的.我可以轻松地将 ID 添加到 ACTOR 文档来代替完整的 MOVIE 文档.但是,我回到了多个电话.NoSQL 数据库(如 MongoDB 或 DynamoDB)中是否有任何功能可以在一次调用中解决此问题?还是 NoSQL 不是正确的选择?

Speed is the most important thing to me. I can easily add ID's to an ACTOR document in place of the full MOVIE document. However I'm back to multiple calls. Are there any features in a NoSQL database like MongoDB or DynamoDB that could solve this in a single call? Or is NoSQL just not the right choice?

推荐答案

虽然 NoSQL 一般推荐 denormalization 的数据模型,最好不要在单个数据库条目中包含无限列表.要在 DynamoDB 中对此数据建模,您应该使用 adjacency用于建模多对多关系的列表.据我所知,没有成本效益的数据建模方法可以让您在一次调用中获得所需的所有数据.但是,您已经说过速度是最重要的(没有给出延迟要求),所以我将尝试让您了解如果存储在 DynamoDB 中,您可以获得多快的数据.

While NoSQL generally recommends denormalization of data models, it is best not to have an unbounded list in a single database entry. To model this data in DynamoDB, you should use an adjacency list for modeling the many-to-many relationship. There's no cost-effective way of modeling the data, that I know of, to allow you to get all the data you want in a single call. However, you have said that speed is most important (without giving a latency requirement), so I will try to give you an idea as to how fast you can get the data if stored in DynamoDB.

你的架构会变成这样:

Actor {
    ActorId, <-- This is the application/database id, not the actor's actual ID
    Name,
    Age,
    Bio
}

Film {
    FilmId, <-- This is the application/database id for the film
    Title,
    Description,
    ReleaseDate
}

ActedIn {
    ActorId,
    FilmId
}

要表明某个演员在电影中表演过,您只需执行一次写入(根据我的经验,使用 DynamoDB 始终是个位数毫秒)即可将 ActedIn 项添加到您的表中.

To indicate that an actor acted in a movie, you only need to perform one write (which is consistently single-digit milliseconds using DynamoDB in my experience) to add an ActedIn item to your table.

要获取某个演员的所有电影,您需要查询一次以获取所有演员的关系,然后进行批量读取以获取所有电影.查询的典型延迟(根据我的经验)低于 10 毫秒,具体取决于网络速度和通过网络发送的数据量.由于 ActedIn 关系是一个如此小的对象,我认为如果您的查询源自也在 AWS 数据中心(EC2、Lambda 等)中运行的东西,那么您可以预期平均 5 毫秒的查询时间.

To get all the movies for an actor, you would need to query once to get all the acted in relationships, and then a batch read to get all the movies. Typical latencies for a query (in my experience) is under 10ms, depending on the network speeds and the amount of data being sent over the network. Since the ActedIn relationship is such a small object, I think you could expect an average case of 5ms for a query, if your query is originating from something that is also running in an AWS datacenter (EC2, Lambda, etc).

获取单个项目将在 5 毫秒以下,您可以并行执行.还有一个 BatchGetItems API,但我没有关于它的任何统计数据.

Getting a single item is going to be under 5 ms, and you can do that in parallel. There's also a BatchGetItems API, but I don't have any statistics for you on that.

那么,~10ms 对你来说足够快吗?

So, is ~10ms fast enough for you?

如果没有,您可以使用 DAX,它将缓存层添加到 DynamoDB 和承诺请求延迟小于 1 毫秒.

If not, you can use DAX, which adds a caching layer to DynamoDB and promises request latency of <1ms.

对于每个 ActedIn 关系,存储您的数据如下:

For every ActedIn relationship, store your data like this:

ActedIn {
    ActorId,
    ActorName,
    ActorAge,
    ActorBio,
    FilmId,
    FilmTitle,
    FilmDescription,
    FilmReleaseDate
}

您只需对任何给定演员进行一次查询即可获取他们的所有电影详细信息,并且只需一次查询即可获取给定电影的所有演员详细信息.实际上不要这样做.重复的数据意味着每次您必须更新演员的详细信息时,您需要为他们参与的每一部电影更新它,对于电影详细信息也是如此.这将是一场运营噩梦.

You only need to make one query for any given Actor to get all of their film details, and only one query to get all the Actor details for a given film. Don't actually do this. The duplicated data means that every time you have to update the details for an Actor, you need to update it for every Film they were in, and similarly for Film details. This will be an operational nightmare.

您应该记住,NoSQL 有很多种(NoSQL = Not Only SQL),因此即使一种 NoSQL 解决方案不适合您,您也不应该完全排除它.如果您在一次调用中绝对需要此功能,则应考虑使用 图形数据库(即另一种 NoSQL 数据库).

You should remember that NoSQL comes in many varieties (NoSQL = Not Only SQL), and so even if one NoSQL solution doesn't work for you, you shouldn't rule it out entirely. If you absolutely need this in a single call, you should consider using a Graph database (which is another type of NoSQL database).

这篇关于如何处理 MongoDB/DynamoDB 中的循环文档?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆