Mongodb:具有索引的多个集合或一个大集合 [英] Mongodb: multiple collections or one big collection w/ index

查看:150
本文介绍了Mongodb:具有索引的多个集合或一个大集合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要帮助在mongo中对数据建模.我的大部分经验是在关系型数据库中,我刚开始使用mongo.我正在为不同事件建模数据.

I need help modeling my data in mongo. Most my experience has been in relational DBs, I am just starting out w/ mongo. I am modeling data for different events.

  1. 每个事件"都有相同的字段.
  2. 每个事件"将包含数百到数百万个文档/行
  3. 事件是动态的,即将根据需要创建新事件. IE. 也许创建一个新的"2016年夏季奥运会"活动.
  1. Each 'event' with have the same fields.
  2. Each 'event' will have hundreds to millions of documents/rows
  3. Events are dynamic, i.e. new ones will be created as needed. i.e. maybe create a new 'Summer Olympics 2016' event.

可能最重要的是,在处理事件(CRUD操作)时,用户将必须指定事件名称.

Probably most important, when dealing with events (CRUD operations) users will have to specify an event name.

到目前为止,我可以看到几种方法来完成此操作,并且我不想在以错误"方式设置数据模型时犯重大错误.

I can see a couple of ways to do this so far and I don't want to make a major mistake in setting up my data model the 'wrong' way.

1)一个事件"集合,其中包含所有事件的数据. 事件"名称的索引.查询如下所示:

1) One 'events' collection that has data for all events. Index on 'event' name. Query would look something like:

db.events.find({event: 'Summer Olympics 2012');
{event: 'Summer Olympics 2012', attributes: [{name: 'joe smith', .... }
{event: 'Summer Olympics 2012', attributes: [{name: 'jane doe', .... }
{event: 'Summer Olympics 2012', attributes: [{name: 'john avery', .... }
{event: 'Summer Olympics 2012', attributes: [{name: 'ted williams', .... }

db.events.find({event: 'Summer Olympics 2013'})
{event: 'Summer Olympics 2013', attributes: [{name: 'steve smith', .... }
{event: 'Summer Olympics 2013', attributes: [{name: 'amy jones', .... }

2)每个出现的新事件的集合,带有跟踪所有事件名称的集合.无需在事件名称上建立索引,因为每个事件都存储在不同的集合中.

2) A collection for each new event that comes along, w/ collection to keep track of all event names. No index on event name needs as each event is stored in a different collection.

// multiple collections, create new as needed
db.summer2012.find() // get summer 2012 docs

db.summer2016.find() // get summer 2016 docs

//'events' collection
db.events.find() // get all events that I would have collections for
{name: 'summer2012', title: 'Summer Olympics 2012'};
{name: 'summer2016', title: 'Summer Olympics 2016'};

对于#1,我有点担心,一旦我到达100个事件且每个事件都有数百万个记录,即使其中一个事件只有500个文档,每个事件"的查找也会很慢.

For #1 I am a little worried that once I reach 100 events each with millions of records that lookups per 'event' will be slow even if one of the events only has 500 documents.

对于#2,我是否每次都创建一个新的集合并在事件发生时在这里裙装" mongo模型?

For #2 Am I 'skirting' the mongo model here by creating a new collection each time and an event comes along?

任何评论/想法都受到欢迎,因为我真的不知道哪个最终会表现更好,或者一个或另一个会让我在以后遇到更多麻烦.我环顾四周(包括mongo的站点),我确实找不到具体答案.

Any comments/ideas are welcome as I really have no idea which one is going to end up performing better or if one or the other would get me into more trouble down the road. I have looked around (mongo's site included) an I really cannot find a concrete answer.

推荐答案

来自mongo文档:数据建模

From mongo docs here: data modeling

在某些情况下,您可能选择将信息存储在 几个集合,而不是一个集合.

In certain situations, you might choose to store information in several collections rather than in a single collection.

考虑一个示例收集日志,该示例存储了以下日志文​​件 各种环境和应用程序.日志集合包含 格式如下的文件:

Consider a sample collection logs that stores log documents for various environment and applications. The logs collection contains documents of the following form:

{log:"dev",ts:...,info:...} {log:"debug",ts:...,info:...}

{ log: "dev", ts: ..., info: ... } { log: "debug", ts: ..., info: ...}

如果文档总数很少,则可以将文档分组为 按类型收集.对于日志,请考虑维护不同的日志 集合,例如logs.dev和logs.debug. logs.dev集合 将只包含与开发环境有关的文档.

If the total number of documents is low you may group documents into collection by type. For logs, consider maintaining distinct log collections, such as logs.dev and logs.debug. The logs.dev collection would contain only the documents related to the dev environment.

通常,拥有大量馆藏并不重要 性能下降,并导致非常好的性能.清楚的 收集对于高通量批处理非常重要.

Generally, having large number of collections has no significant performance penalty and results in very good performance. Distinct collections are very important for high-throughput batch processing.

也和10世代的家伙说话.对于非常大的收藏,他列出了将其分离成更小的更具体的收藏的多种好处.他关于对所有数据使用一个集合并使用索引的评论是:

Also spoke w/ 10gen guy. For really large collections he listed multiple benefits for separating out into smaller more specific collections. His comment on using one collection for all the data and using an index was:

仅仅因为您可以做某事并不意味着您应该做.模型 您的数据适当.可能很容易存储在一个大集合中 和索引,但这并不总是最好的方法.

Just because you can do something does not mean you should. Model your data appropriately. may be easy to store in one large collection and index but that is not always best approach.

这篇关于Mongodb:具有索引的多个集合或一个大集合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆