MongoDB的数据库架构设计 [英] MongoDB database schema design

查看:170
本文介绍了MongoDB的数据库架构设计的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有50万用户(SQL Server 2008上运行)网站。我想现在包括用户和他们的朋友的活动流。测试的几件事情在SQL Server上后,很明显,关系数据库管理系统不适合这种特性的一​​个很好的选择。它的速度慢(甚至当我重反规范化我的数据)。因此,寻找其他的NoSQL解决方案后,我已经想通了,我可以使用MongoDB的这一点。我将数据结构基于以下的 activitystrea.ms JSON规范的活动流 所以我的问题是:什么是最好的模式设计在MongoDB的活动流(用这么多的用户,你可以pretty的多predict,这将是非常沉重的写操作,因此我选择的MongoDB的 - 它有伟大的写的表现。我已经想过3种结构的,请告诉我,如果这是有道理,还是应该使用其他架构模式。

I have a website with 500k users (running on sql server 2008). I want to now include activity streams of users and their friends. After testing a few things on SQL Server it becomes apparent that RDMS is not a good choice for this kind of feature. it's slow (even when I heavily de-normalized my data). So after looking at other NoSQL solutions, I've figured that I can use MongoDB for this. I'll be following data structure based on activitystrea.ms json specifications for activity stream So my question is: what would be the best schema design for activity stream in MongoDB (with this many users you can pretty much predict that it will be very heavy on writes, hence my choice of MongoDB - it has great "writes" performance. I've thought about 3 types of structures, please tell me if this makes sense or I should use other schema patterns.

1 - 存储与各界朋友每个活动/在这种模式的追随者:

1 - Store each activity with all friends/followers in this pattern:

 

    {
     _id:'activ123',
     actor:{
            id:person1
            },
    verb:'follow',
    object:{
            objecttype:'person',
            id:'person2'
            },
    updatedon:Date(),
    consumers:[
            person3, person4, person5, person6, ... so on
            ]

    }

2 - 第二设计:集合名称 - activity_stream_fanout

2 - Second design: Collection name- activity_stream_fanout



    {
    _id:'activ_fanout_123',
    personId:person3,
    activities:[
    {
     _id:'activ123',
     actor:{
            id:person1
            },
    verb:'follow',
    object:{
            objecttype:'person',
            id:'person2'
            },
    updatedon:Date(),
    }

    ],[
    //activity feed 2
    ]

    }


3 - 这种方法是将存储在一个集合中的活动件,并且消费者在另一个。在活动中,你可能有这样一个文件:

3 - This approach would be to store the activity items in one collection, and the consumers in another. In activities, you might have a document like:



    { _id: "123",
      actor: { person: "UserABC" },
      verb: "follow",
      object: { person: "someone_else" },
      updatedOn: Date(...)

    } 

然后,跟随者,我会在下面的通知的文件:

And then, for followers, I would have the following "notifications" documents:



    { activityId: "123", consumer: "someguy", updatedOn: Date(...) }
    { activityId: "123", consumer: "otherguy", updatedOn: Date(...) }
    { activityId: "123", consumer: "thirdguy", updatedOn: Date(...) } 

您的答案是大大AP preciated。

Your answers are greatly appreciated.

推荐答案

我会用下面的结构去:

  1. 使用一个集合为happend所有操作,操作

使用另一个集合谁跟随谁,订阅

Use another collection for who follows whom, Subscribers

使用第三方提供的藏品,新闻源某个用户的新闻提要,项目扇出从操作集合。

Use a third collection, Newsfeed for a certain user's news feed, items are fanned-out from the Actions collection.

新闻源系列将通过工作进程异步处理新的操作填充。因此,新闻不会填充实时性。我不同意海尔特 - 扬在实时是很重要的;相信大多数用户不喜欢甚至在的的(不是全部)应用程序(实时,我就选择了完全不同的架构)的延迟分钟。

The Newsfeed collection will be populated by a worker process that asynchronously processes new Actions. Therefore, news feeds won't populate in real-time. I disagree with Geert-Jan in that real-time is important; I believe most users don't care for even a minute of delay in most (not all) applications (for real time, I'd choose a completely different architecture).

如果你有一个非常大的数量的消费者,扇出可能需要一段时间,真实的。在另一方面,把消费者对入对象不会具有非常大的追随者计数无论是工作,这将创建占用大量的索引空间过大的物体。

If you have a very large number of consumers, the fan-out can take a while, true. On the other hand, putting the consumers right into the object won't work with very large follower counts either, and it will create overly large objects that take up a lot of index space.

最重要的是,扇出的设计是非常的更灵活,并允许相关性得分,过滤,等等。我最近刚刚写的news饲料架构设计,MongoDB的,我解释了一些灵活性更详细。

Most importantly, however, the fan-out design is much more flexible and allows relevancy scoring, filtering, etc. I have just recently written a blog post about news feed schema design with MongoDB where I explain some of that flexibility in greater detail.

说到灵活性,我会小心的activitystrea.ms规范。这似乎是有道理的作为规范不同的供应商之间的互操作性,但我不会储存所有的详细​​信息,在我的数据库,只要你不打算从各种应用程序聚集活动。

Speaking of flexibility, I'd be careful about that activitystrea.ms spec. It seems to make sense as a specification for interop between different providers, but I wouldn't store all that verbose information in my database as long as you don't intend to aggregate activities from various applications.

这篇关于MongoDB的数据库架构设计的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆