mongo模式(嵌入vs引用) [英] mongo schema (embedding vs reference)

查看:148
本文介绍了mongo模式(嵌入vs引用)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们假设我正在设计一个像Foursquare这样的服务,基于用户位置来跟踪用户签入。我使用MongoDB作为后端。



这里的前提是用户可以签入到一个位置,所以模式中的集合可能看起来像这样: p>

  db.places.find()
{_id:ObjectId(4e6a5a58a43a59e451d69351),address:{ street:2020 Lombard St,city:San Francisco,state:CA},latlong:[37.800274,-122.434914],name:Marina Sushi :America / Los_Angeles}
{_id:ObjectId(4e6a59c3a43a59e451d69350),address:{street:246 Kearny St,city:San Francisco :CA},latlong:[37.79054,-122.40361],name:Rickhouse,timezone:America / Los_Angeles}

db.users.find )
{_id:ObjectId(4e936bc1da06d5e081544b8b),_class:com.gosociety.server.common.model.User,email:goso@gosociety.com :asdfasdf}



在上面的集合中,我们有地方和用户。用户可以签入某个地方,因此,当用户签入时,我们将在数据库中保留该记录。办理登机手续的时间包括:办理登机手续时间(UTC)和备注(150个字符),以及是否发送到他的Facebook Feed(布尔值)。



根据描述,我可以在Mongo中考虑两种方案设计的替代方案:




  • 创建一个签入集合,并使用mongo生成的引用ID将其存储在User集合中,将Places集合作为每个集合中的签入[]。


  • 不要创建签入集合,但需要同时更新地方数据和用户数据相同的签入信息。




我相信我在mongo文档中读到,聚合应该直接使用,如果如果没有包含聚合信息的对象,几乎不会显示正在聚合的数据。如果我们按照foursquare应用程序使用的方法,只有当我们查看他们的个人资料或当我们查看他们的地点详情时,才会显示用户的总签到。



感谢。

解决方案

个人我会使用单独的集合,主要是为了保持您的用户/放置对象小,因为你可以有一个无限的#每个用户/地点的签入。如果您在checkins集合中放置了user_id / timestampl和place_id / timestamp的索引,那么对特定用户或地点的查询将是有效的。使用单独的集合的第二个好处是,当MongoDB的增长过大时,不必继续移动您的用户或位置对象。相反,它只是继续追加到checkins集合,这应该是相当有效率(每个分片每秒1000秒的插入10秒)。



我还应该提到我将不会将签到ID存储在位置和用户文档中,因为您通过在签入文档中的place_id或user_id上获得相同的性能收益。


Let's assume that I am designing a service like Foursquare that tracks user checkins based on on a user location. I am using MongoDB as the backend.

The premise here is that a user can check-in to a location, so collections in the schema might look like this:

db.places.find()
{ "_id" : ObjectId("4e6a5a58a43a59e451d69351"), "address" : { "street" : "2020 Lombard     St", "city" : "San Francisco", "state" : "CA" }, "latlong" : [ 37.800274, -122.434914 ], "name" : "Marina Sushi", "timezone" : "America/Los_Angeles" }
{ "_id" : ObjectId("4e6a59c3a43a59e451d69350"), "address" : { "street" : "246 Kearny St", "city" : "San Francisco", "state" : "CA" }, "latlong" : [ 37.79054, -122.40361 ], "name" : "Rickhouse", "timezone" : "America/Los_Angeles" }

db.users.find()
{ "_id" : ObjectId("4e936bc1da06d5e081544b8b"), "_class" : "com.gosociety.server.common.model.User", "email" : "goso@gosociety.com", "password" : "asdfasdf"}

So in the above collections, we have places and users. A user can "check-in" to a place, so when a user checks in, we'll keep a record of that in the database. A check-in would consist of: time of check-in(UTC), and note(150 characters), and whether it was sent to his Facebook feed or not (boolean).

Based on the description, I could think of two alternatives for schema design in Mongo:

  • Create a checkin collection, and use the mongo generated reference id to store that in the User collection, and the Places collection as a check-ins [] in each collection. This way it would be easy to determine aggregate statistics per user and per venue.

  • Dont' create a checkin collection, but update both the Place and User data with the same check-in information.

I believe I read in the mongo documentation that aggregation should directly be used if the data being aggregated is almost never displayed without the Object containing the aggregate info. If we follow the method that the foursquare app uses, it shows the users total check-ins only when we view their profile or place check-in stats when we view their place details.

Any suggestions here would be much appreciated.

Thanks.

解决方案

Personally I would go with a separate collection, mainly for the purpose of keeping your user/place objects small, since you can have an unbounded # of checkins per user/place. If you put an index on user_id/timestampl and place_id/timestamp in your checkins collection, then queries for a particular user or place will be efficient. A second benefit to using a separate collection is that MongoDB won't have to keep moving your user or place object when it grows too large. Instead, it will just keep appending to the checkins collection, which should be quite efficient (10s of 1000s of inserts per second per shard).

I should also mention that I would not store the checkin IDs in either the place nor the user document, since you get the same performance benefit from having an index on place_id or user_id in the checkins document.

这篇关于mongo模式(嵌入vs引用)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆