在MongoDB中有大量的集合(需要模式设计建议) [英] Having large number of collections in MongoDB ( Need Schema Design suggestions )

查看:124
本文介绍了在MongoDB中有大量的集合(需要模式设计建议)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在考虑使用MongoDB保存从100个摄像机记录的图像的元数据,并且每个摄像机的记录将保留30天。如果一个摄像机在一天内提供100,000个图像,我将在MongoDB中保存( 100 x 30 x 100000 )图像(文档)。我的网络应用程序将查询此数据为:



选择相机>选择日期>选择一小时>获取该小时内的所有图片



我打算用以下三个选项设计模式,并需要您的专家意见/建议以最好的出路;



1)每小时收集 72000个MongoDB收藏,即每个相机每小时收集一次 X 24小时)(使用--nssize 500命令超过24000限制)。我恐怕如果MongoDB将允许我创建这些收藏,其次是期望性能的好处和损失,同时阅读和写这些集合。虽然,对于每小时读取图像看起来非常容易与这个模式,因为我可以获取数据在一个单一的查询到任何集合。



2) Day-wise集合:创建 3000个MongoDB集合,即每个摄像机每天收集一次(100个摄像机X 30天)。虽然这是允许的,似乎很好的收集数量,但我关心的是从特定的一天收集中的特定小时读取图像。



3) 创建 100个MongoDB集合,即每个摄像机1个集合(100个摄像机/集合)。然后使用格式为(20141122061055000)的格式保存具有唯一id的快照,该格式是重写整个日期时间戳(2014-11-22 06:10:55.000)。



我希望如果理想情况下我可以做(1),(2)或(3),但任何其他选择欢迎。



请建议我的选择MongoDB,

解决方案

这继续从: 使用MongoDB的优点和缺点而不是MS SQL Server



我不确定为什么你试图采取使用许多集合的建议。



在MongoDB中以这种方式使用许多集合被认为是一个坏主意(并且您必须在索引开销之后增加ns大小)缩放单个集合的通用文档水平出口。



我会使用一个具有文档结构的单一集合(可能是我的头顶):

  {
_id:{},
camera_id:ObjectId(),
image:{},
小时:ts_of_hour,
天:ts_of_day
}

所有的数据,你需要根据你想要的面额选择图像。



注意:也考虑MongoDB锁是数据库级别,而不是集合级别。



编辑









$ b b $ b

要回答您的一些疑虑:



注意:我没有设计您的应用程序,这是一个晚回答(深夜)


每个相机的一个集合,即100个集合。


再次,我真的没有看到这一点,如果你这样做的优化的原因,那么你会做一个摄像机每个数据库,但这是官方的杀戮。老实的30米记录是什么,我现在将解决这个问题。无论你是在谈论SQL还是MongoDB一个30分钟的记录收集通常被认为是小,分钟甚至,在数据库潜力(与MS SQL说,他们可以存储每个表的百万字节)。



  1. 选择FromDate和ToDate 2之间的所有图片


您可以使用文档上的BSON日期字段来完成上述操作。



  1. 在FromDate和ToDate之间选择顶部(COUNT)张图片


code> count()。



top()未在所有DB系统中实现,因此这是MS SQL特定的,但在此特定查询它没有什么用处,因为该查询将总是返回一行。



您可以将此特定数据聚合到另一个集合。这很好,所以在另一个集合中,你会有一组天:

  {
count:3,
day:(date | ts)
}

因为 count()在一个大工作集上可能会变慢。因此,集合的目的是汇总您的数据,使您的查询工作集更易于管理。



因此,其他集合可以用来保存聚合函数的缓存



基本上,就像在SQL中一样,通常的模式或文档被分组在集合。所以真的我会设计你的应用程序在SQL只有一个表:图像和也许相机以及。 >

除了5之外的所有其他人都在这里被宽松地覆盖了:



  1. 从/向具有ID的图片中选择上一张/下一张图片


_id 这里像这样:

  db.images.find _id:{$ gt:last_id}})。limit(1)



至于您在这里张贴的评论:


意味着在MongoDB中,查询30个文档的集合与查询30,00,000个文档的集合没有什么不同?


关于你对数据库设计的了解以及如何扩展数据库架构。这不仅仅适用于MongoDB,也适用于SQL。如果设置正确的SQL可以很容易地查询30m记录,如30。



这就是sharding。至于它是否会快速到达你的索引跨这些分片,运行查询和它们的工作集大小(RAM中需要多少数据,是否在RAM?)。通过它的外观索引over image_id( ObjectId )和日期可能会给你你想要的。然而,这将需要更多的测试,因为我相信你是一个新的缩放数据库,你应该真的做一些搜索这个问题通过谷歌或什么。



30分钟的文档可能不需要分片,所以这可能只是一个良好的索引的情况。



希望这有助于我,我没有在这里圆圈,


I am considering MongoDB to hold metadata of images, recorded from 100 cameras, and the records will be kept for 30 days for each camera. If one camera gives 100,000 images in a day then i am going to save (100 x 30 x 100000) images (documents) at max in MongoDB. My web application will query this data as:

Select a Camera > Select a Date > Select an Hour > Fetch all images in that hour.

I plan to design schema with following three options, and need your expert opinion/suggestion for the best way out;

1) Hour-wise Collections: Create 72000 MongoDB Collections, i.e. 1 Collection per Hour for each Camera (100 cameras X 30 days X 24 hours) (using --nssize 500 command to exceed 24000 limit). I am afraid if MongoDB will allow me to create these much collections and secondly what are expected performance benefits and losses while reading and writing to these collection. Though, for reading per hour images looks tremendously easy with this schema, because i can fetch data in a single query to any Collection.

2) Day-wise Collections: Create 3000 MongoDB Collections, i.e. 1 Collection per Day for each Camera (100 cameras X 30 days). Though this is allowable and seems good number of collection but my concern is reading images from a particular hour inside particular day collection.

3) Camera-wise Collections: Create 100 MongoDB Collections, i.e. 1 Collection for each Camera (100 cameras/collections). Then saving snapshots with unique 'id' in format like (20141122061055000) that is a rephrasing of full date timestamp (2014-11-22 06:10:55.000).

I wish if ideally i could do (1), (2) or (3) but any other option is welcomed.

Please suggest about my selection for MongoDB as well, considering my case.

Regards.

解决方案

This continues from: Pros and Cons of using MongoDB instead of MS SQL Server.

I am unsure why you are trying to take the advise of using many collections.

Using many collections in this way in MongoDB is considered a bad idea (and you would have to increase ns size for this most likely after your index overhead), you should instead scale a single collection of common docs way out horizontally. It seems the other answerers agree.

I would use a single collection with a document structure maybe of (quick off the top of my head):

{
    _id: {},
    camera_id: ObjectId(),
    image: {},
    hour: ts_of_hour,
    day: ts_of_day
}

That way you got all the data you need to select images based on whatever denomination you want.

NB: Consider as well that MongoDBs lock is database level, not collection level. You won't gain anything useful here only making your querying harder and more complex and maybe making your data harder to maintain.

Edit

To answer some of your concerns:

NB: I have not designed your app and this is a late answer (late at night too) so basically this is me fleshing out basic concepts that immediately come to mind.

1 collection for each camera, i.e. 100 collections almost.

Again I don't really see the point, if you were to do this for optimisation reasons then you would do it as one camera per DB, but that is officially overkill. Honestly 30m records is nothing, I will resolve that concern right now. Whether you are talking about SQL or MongoDB a 30m record collection is normally considered small, minute even, in terms of the databases potential (with MS SQL saying they can store perabytes per table).

  1. Select All images of between FromDate and ToDate 2

You can use the answer above to accomplish that using a BSON date field on your document.

  1. Select Top(COUNT) images between FromDate and ToDate

You can just count().

top() is not implemented in all DB systems so this is MS SQL specific here however in this particular query it does nothing useful since that query will always return one row.

You can aggregate this particular data to another collection. That is fine, so in another collection you would have a set of days:

{
     count: 3,
     day: (date|ts)
}

And then you can just some up over the days since count() can get slow on a large working set. So the aim of the collection to summarise your data to make your working set for queries more manageable.

So other collections are fine to use to hold "cache" of aggregation functions which would be slow, or of course to hold other entities within your app (like a relational DB would).

Basically, like in SQL, common schemas or documents get grouped in collections. So really I would design your app in SQL with only one table: images and maybe camera as well.

All others except for 5 have been covered loosely here so:

  1. Select previous/next images from/to an Image with an ID

You can use the _id here like so:

db.images.find({_id: {$gt: last_id}}).limit(1)

And that should work pretty well.

As for the comment you posted here as well:

Do you mean that in MongoDB, querying a collection with 30 documents is not different from querying a collection with 30,00,000 documents ?

Now that depends on how much you know about database design in general and how to scale database architecture. This is something that doesn't just apply to MongoDB but also to SQL. If set-up right SQL can easily query 30m records like 30.

What it all comes down to is sharding. As to whether it would be fast comes down to your indexes across those shards that the queries to run and their working set size (how much data is needed in RAM, is it in RAM?). By the looks of it a shard index over image_id (ObjectId) and date might give you what you want. However this will need more testing and since I believe you are a little new to scaling databases you should really do some searching on this subject via Google or something.

NB again: 30m documents might not need sharding so this could be just a case of making good indexes.

Hopefully this helps and I haven't gone round in circles here,

这篇关于在MongoDB中有大量的集合(需要模式设计建议)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆