如何在MongoDB/DynamoDB中处理循环文档? [英] How to handle circular documents in MongoDB/DynamoDB?

查看:92
本文介绍了如何在MongoDB/DynamoDB中处理循环文档?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当前该站点正在使用关系数据库(MySQL),但是连接所有数据的速度太长,并且需要进行缓存,这导致了其他问题.

Currently the site is using a relational database (MySQL) however the speed to join all the data is too long and has required caching that has lead to other issues.

问题在于两个表如何相互嵌套以创建循环引用.一个简单的示例是两个表,一个表用于ACTOR,第二个表用于MOVIE.电影将有演员,而演员将有电影.显然,在关系数据库中这很容易.

The issue is how the two tables would nest into each other creating a circular reference. A simple example would be two tables, one for an ACTOR and a second for a MOVIE. The movie would have the actor and the actor would have a movie. Obviously this is easy in a relational database.

例如,一个ACTOR模式:

So for example, an ACTOR schema:

ACTOR1
- AGE
- BIO
- MOVIES
    - FILM1 (ties to the FILM1 document)
    - FILM2

然后是MOVIE模式:

Then the MOVIE schema:

FILM1
- RELEASE DATE
- ACTORS
    - ACTOR1 (ties back to the ACTOR document)
    - ACTOR2

速度对我来说是最重要的.我可以轻松地将ID代替完整的MOVIE文档添加到ACTOR文档.但是,我要打回多个电话. NoSQL数据库中是否有任何功能(如MongoDB或DynamoDB)可以在一个调用中解决?还是NoSQL并不是正确的选择?

Speed is the most important thing to me. I can easily add ID's to an ACTOR document in place of the full MOVIE document. However I'm back to multiple calls. Are there any features in a NoSQL database like MongoDB or DynamoDB that could solve this in a single call? Or is NoSQL just not the right choice?

推荐答案

虽然NoSQL通常建议非规范化 adjacency用来建立多对多关系的列表.据我所知,没有一种经济有效的数据建模方法可以让您在一次调用中获得所需的所有数据.但是,您已经说过,速度是最重要的(不要求等待时间),因此,我将尝试让您了解存储在DynamoDB中数据的获取速度.

While NoSQL generally recommends denormalization of data models, it is best not to have an unbounded list in a single database entry. To model this data in DynamoDB, you should use an adjacency list for modeling the many-to-many relationship. There's no cost-effective way of modeling the data, that I know of, to allow you to get all the data you want in a single call. However, you have said that speed is most important (without giving a latency requirement), so I will try to give you an idea as to how fast you can get the data if stored in DynamoDB.

您的模式将变成这样:

Actor {
    ActorId, <-- This is the application/database id, not the actor's actual ID
    Name,
    Age,
    Bio
}

Film {
    FilmId, <-- This is the application/database id for the film
    Title,
    Description,
    ReleaseDate
}

ActedIn {
    ActorId,
    FilmId
}

要表明演员在电影中扮演角色,您只需执行一次写操作(根据我的经验,使用DynamoDB始终为一位数毫秒),即可将ActedIn项目添加到表中.

To indicate that an actor acted in a movie, you only need to perform one write (which is consistently single-digit milliseconds using DynamoDB in my experience) to add an ActedIn item to your table.

要获取演员的所有电影,您需要查询一次以获取所有关系中的演员,然后批量读取以获取所有电影.根据我的经验,查询的典型延迟在10毫秒以下,具体取决于网络速度和通过网络发送的数据量.由于ActedIn关系是如此之小,因此,如果您的查询源自也在AWS数据中心(EC2,Lambda等)中运行的某些对象,那么我认为查询的平均情况为5ms.

To get all the movies for an actor, you would need to query once to get all the acted in relationships, and then a batch read to get all the movies. Typical latencies for a query (in my experience) is under 10ms, depending on the network speeds and the amount of data being sent over the network. Since the ActedIn relationship is such a small object, I think you could expect an average case of 5ms for a query, if your query is originating from something that is also running in an AWS datacenter (EC2, Lambda, etc).

获取单个项目将在5毫秒内完成,您可以并行进行.还有一个BatchGetItems API,但是我没有关于您的任何统计信息.

Getting a single item is going to be under 5 ms, and you can do that in parallel. There's also a BatchGetItems API, but I don't have any statistics for you on that.

那么,〜10ms足够快吗?

So, is ~10ms fast enough for you?

如果没有,则可以使用 DAX ,它会向DynamoDB和保证请求延迟小于1毫秒.

If not, you can use DAX, which adds a caching layer to DynamoDB and promises request latency of <1ms.

对于每个ActedIn关系,按如下方式存储您的数据:

For every ActedIn relationship, store your data like this:

ActedIn {
    ActorId,
    ActorName,
    ActorAge,
    ActorBio,
    FilmId,
    FilmTitle,
    FilmDescription,
    FilmReleaseDate
}

您只需要对任何给定的Actor进行一次查询即可获取其所有电影详细信息,而只需查询一次就可以获取给定电影的所有Actor详细信息. 实际上并没有这样做.重复的数据意味着,每次您必须更新演员的详细信息时,都需要为他们所处的每部电影以及电影详细信息进行更新.这将是一场运营中的噩梦.

You only need to make one query for any given Actor to get all of their film details, and only one query to get all the Actor details for a given film. Don't actually do this. The duplicated data means that every time you have to update the details for an Actor, you need to update it for every Film they were in, and similarly for Film details. This will be an operational nightmare.

您应该记住,NoSQL有很多变体(NoSQL =不只是SQL),因此,即使一个NoSQL解决方案对您不起作用,也不应完全排除它.如果您绝对需要一次调用,则应考虑使用图形数据库(即另一种NoSQL数据库).

You should remember that NoSQL comes in many varieties (NoSQL = Not Only SQL), and so even if one NoSQL solution doesn't work for you, you shouldn't rule it out entirely. If you absolutely need this in a single call, you should consider using a Graph database (which is another type of NoSQL database).

这篇关于如何在MongoDB/DynamoDB中处理循环文档?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆