MongoDB 数据库架构设计 [英] MongoDB database schema design

查看:21
本文介绍了MongoDB 数据库架构设计的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个拥有 50 万用户的网站(在 sql server 2008 上运行).我现在想包括用户及其朋友的活动流.在 SQL Server 上测试了一些东西后,很明显 RDMS 不是这种功能的好选择.它很慢(即使我对数据进行了大量非规范化处理).因此,在查看了其他 NoSQL 解决方案之后,我认为我可以为此使用 MongoDB.我将遵循基于 activitystrea.ms 的数据结构活动流的json规范所以我的问题是:MongoDB 中活动流的最佳模式设计是什么(有这么多用户,您几乎可以预测它的写入量会非常大,因此我选择了 MongoDB - 它具有出色的写入"性能.我已经考虑了 3 种类型的结构,请告诉我这是否有意义,或者我应该使用其他模式模式.

1 - 以这种模式存储所有朋友/关注者的每个活动:

<前>{_id:'activ123',演员:{编号:person1},动词:'跟随',目的:{对象类型:'人',id:'person2'},更新日期:日期(),消费者:[人 3、人 4、人 5、人 6、... 以此类推]}

2 - 第二个设计:集合名称 - activity_stream_fanout

<前>{_id:'activ_fanout_123',人员 ID:人员 3,活动:[{_id:'activ123',演员:{编号:person1},动词:'跟随',目的:{对象类型:'人',id:'person2'},更新日期:日期(),}],[//活动源2]}

3 - 这种方法是将活动项存储在一个集合中,而将消费者存储在另一个集合中.在活动中,您可能有这样的文档:

<前>{_id:123",演员:{人:UserABC"},动词:跟随",对象:{人:someone_else"},更新日期:日期(...)}

然后,对于关注者,我会有以下通知"文档:

<前>{ activityId: "123", 消费者: "someguy", updatedOn: Date(...) }{ activityId: "123", 消费者: "otherguy", updatedOn: Date(...) }{ activityId: "123", 消费者: "thirdguy", updatedOn: Date(...) }

非常感谢您的回答.

解决方案

我会采用以下结构:

  1. 对发生的所有动作使用一个集合,Actions

  2. 为谁关注谁使用另一个集合,订阅者

  3. 为特定用户的新闻提要使用第三个集合 Newsfeed,项目从 Actions 集合中散开.

Newsfeed 集合将由异步处理新的 Actions 的工作进程填充.因此,新闻提要不会实时填充.我不同意 Geert-Jan 的观点,因为实时性很重要.我相信大多数用户都不关心大多数(不是全部)应用程序中的一分钟延迟(对于实时,我会选择完全不同的架构).

如果您有大量的消费者,那么扇出可能需要一段时间,确实如此.另一方面,将消费者直接放入对象中也不适用于非常大的追随者数量,并且会创建过大的对象,占用大量索引空间.

然而,最重要的是,扇出设计更加灵活,并允许相关性评分、过滤等.我最近刚刚写了一篇关于 使用 MongoDB 进行新闻提要模式设计,其中我更详细地解释了一些灵活性.>

说到灵活性,我会小心那个 activitystrea.ms 规范.作为不同提供商之间互操作的规范,它似乎很有意义,但只要您不打算聚合来自各种应用程序的活动,我就不会将所有这些详细信息存储在我的数据库中.

I have a website with 500k users (running on sql server 2008). I want to now include activity streams of users and their friends. After testing a few things on SQL Server it becomes apparent that RDMS is not a good choice for this kind of feature. it's slow (even when I heavily de-normalized my data). So after looking at other NoSQL solutions, I've figured that I can use MongoDB for this. I'll be following data structure based on activitystrea.ms json specifications for activity stream So my question is: what would be the best schema design for activity stream in MongoDB (with this many users you can pretty much predict that it will be very heavy on writes, hence my choice of MongoDB - it has great "writes" performance. I've thought about 3 types of structures, please tell me if this makes sense or I should use other schema patterns.

1 - Store each activity with all friends/followers in this pattern:

 

    {
     _id:'activ123',
     actor:{
            id:person1
            },
    verb:'follow',
    object:{
            objecttype:'person',
            id:'person2'
            },
    updatedon:Date(),
    consumers:[
            person3, person4, person5, person6, ... so on
            ]

    }

2 - Second design: Collection name- activity_stream_fanout


    {
    _id:'activ_fanout_123',
    personId:person3,
    activities:[
    {
     _id:'activ123',
     actor:{
            id:person1
            },
    verb:'follow',
    object:{
            objecttype:'person',
            id:'person2'
            },
    updatedon:Date(),
    }

    ],[
    //activity feed 2
    ]

    }


3 - This approach would be to store the activity items in one collection, and the consumers in another. In activities, you might have a document like:


    { _id: "123",
      actor: { person: "UserABC" },
      verb: "follow",
      object: { person: "someone_else" },
      updatedOn: Date(...)

    } 

And then, for followers, I would have the following "notifications" documents:


    { activityId: "123", consumer: "someguy", updatedOn: Date(...) }
    { activityId: "123", consumer: "otherguy", updatedOn: Date(...) }
    { activityId: "123", consumer: "thirdguy", updatedOn: Date(...) } 

Your answers are greatly appreciated.

解决方案

I'd go with the following structure:

  1. Use one collection for all actions that happend, Actions

  2. Use another collection for who follows whom, Subscribers

  3. Use a third collection, Newsfeed for a certain user's news feed, items are fanned-out from the Actions collection.

The Newsfeed collection will be populated by a worker process that asynchronously processes new Actions. Therefore, news feeds won't populate in real-time. I disagree with Geert-Jan in that real-time is important; I believe most users don't care for even a minute of delay in most (not all) applications (for real time, I'd choose a completely different architecture).

If you have a very large number of consumers, the fan-out can take a while, true. On the other hand, putting the consumers right into the object won't work with very large follower counts either, and it will create overly large objects that take up a lot of index space.

Most importantly, however, the fan-out design is much more flexible and allows relevancy scoring, filtering, etc. I have just recently written a blog post about news feed schema design with MongoDB where I explain some of that flexibility in greater detail.

Speaking of flexibility, I'd be careful about that activitystrea.ms spec. It seems to make sense as a specification for interop between different providers, but I wouldn't store all that verbose information in my database as long as you don't intend to aggregate activities from various applications.

这篇关于MongoDB 数据库架构设计的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆