与MongoDB的多对多关系 [英] Many to many relationships with MongoDB at large scale

查看:143
本文介绍了与MongoDB的多对多关系的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我看过很多关于如何与MongoDB建立多对多关系的文章,但是都没有提到扩展.例如,这些帖子:

I've seen many posts on how to do many-to-many relationships with MongoDB, but none of them mention scale. For example these posts:

MongoDB多对多协会

如何在MongoDB

我可以通过这种设置看到的问题是MongoDB的16MB文档限制.假设我有user s,group s和post s. post有一个关联的group和许多user都可以喜欢它. group中包含许多post,并且可以跟随许多user. user可以具有许多喜欢的post,并且可以跟随许多group.如果要使用关系数据库构建此数据库,则可以这样设置:

The problem I can see with this kind of setup is MongoDB's 16MB document limit. Say I have users, groups, and posts. posts have an associated group and many users that can like it. A group has many posts in it, and many users that can follow it. A user can have many liked posts and can follow many groups. If I were to build this with a relational database I would set it up like this:

user:
    user_id
    username

post:
    post_id
    group_id
    message

group:
    group_id
    name

post_likes:
    post_id
    liked_user_id

group_followers:
    group_id
    follower_user_id

理论上,group可以具有无限数量的post,并且紧随user之后,post可以具有无限数量的喜欢的user s,而user可以具有如果在SQL查询中正确完成了分页,则可以跟随无限数量的喜欢的postgroup.

In theory, a group can have an ulimited number of posts and following users, a post can have an unlimited number of liked users, and a user can have an unlimited number of liked posts and groups that they are following if pagination is done correctly in the SQL queries.

如何设置MongoDB的架构,以实现这种扩展?

How can I setup the schema of MongoDB so that this sort of scale can be achieved?

推荐答案

这是一个很好的问题,它说明了过度覆盖的问题以及如何解决它.

This is a good question which illustrates the problems with overemebedding and how to deal with it.

我们以用户喜欢帖子的示例为例,这是一个简单的示例.其他关系必须相应地处理.

Let's stick with the example of users liking posts, which is a simple example. The other relations would have to be handled accordingly.

完全正确的做法是,将喜欢的内容存储在帖子中,迟早会导致非常受欢迎的帖子达到大小限制的问题.

You are absolutely right that with storing the likes inside the post would sooner or later lead to the problem that very popular posts would reach the size limit.

因此,您可以正确地回退以创建post_likes集合.为什么我称这是正确的?由于它适合您的用例以及功能和非功能需求!

So you correctly fell back to create a post_likes collection. Why do I call this correct? Since it fits your use cases and functional and non-functional requirements!

  • 它的缩放比例不确定(嗯,有一个理论极限,但是很大)
  • 易于维护(在post_idliked_user_id上创建唯一索引)和使用(用户和帖子都是已知的,因此添加诸如此类的内容就是简单的插入操作,或者更可能是upsert) li>
  • 您可以轻松找出哪些用户喜欢哪个帖子,哪个用户喜欢哪个帖子
  • It scales indefinetly (well, there is a theoretical limit, but it is humongous)
  • It is easy to maintain (create a unique index over post_id and liked_user_id) and use (both the user and the post are known, so adding a like is a simple insert or more likely an upsert)
  • You are able to easily find out which users like which post and which post is liked by which users

但是,我会稍微扩展集合,以防止对某些频繁使用的情况进行不必要的查询.

However I would expand the collection a bit to prevent unneeded queries for certain use cases which are frequent.

让我们现在假设帖子标题和用户名不能更改.在这种情况下,以下数据模型可能更有意义

Let's assume for now that post titles and usernames can't be changed. In that case, the following data model could make more sense

{
  _id: new ObjectId(),
  "post_id": someValue,
  "post_title": "Cool thing",
  "liked_user_id": someUserId,
  "user_name": "JoeCool"
}

现在,假设您要显示所有喜欢该帖子的用户的用户名.使用上面的模型,那将是一个简单而快速的查询:

Now let's assume you want to display the username of all users that liked a post. With the model above, that would be a single, rather fast query:

db.post_likes.find(
  {"postId":someValue},
  {_id:0,user_name:1}
)

仅存储ID,这项相当平常的任务将需要至少两个查询,并且-考虑到一个帖子可能有无数个赞者的约束-潜在的巨大内存消耗(您需要将用户ID存储在RAM中.

With only the IDs stored, this rather usual task would need at least two queries and - given the constraint that there can be an infinite number of likers for a post - potentially huge memory consumption (you'd need to store the user IDs in RAM).

当然,这会导致一些冗余,但是即使有数以百万计的人喜欢发帖,我们也只谈论了几兆字节的相对便宜(且易于扩展)的磁盘空间,却获得了很多性能的用户体验.

Granted, this leads to some redundancy, but even when millions of people like a post, we are talking only of a few megabytes of relatively cheap (and easy to scale) disk space while gaining a lot of performance in terms of user experience.

现在事情来了:即使用户名和帖子标题可能会更改,您也只需要进行多次更新即可:

Now here comes the thing: Even if the user names and post titles are subject to change, you only had to do a multi update:

db.post_likes.update(
  {"post_id":someId},
  { $set:{ "post_title":newTitle} },
  { multi: true}
)

您交易的是,花一些时间来做一些非常罕见的事情,例如更改用户名或帖子以极快的速度处理经常发生的用例.

You are trading that it takes a while to do some rather rare stuff like changing a username or a post for extreme speed for use cases which happen extremely often.

请记住,MongoDB是面向文档的数据库.因此,请记录您感兴趣的事件以及将来查询所需的值,并相应地对数据进行建模.

Keep in mind that MongoDB is a document oriented database. So document the events you are interested in with the values you need for future queries and model your data accordingly.

这篇关于与MongoDB的多对多关系的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆