MongoDB作为时间序列数据库 [英] MongoDB as a Time Series Database

查看:928
本文介绍了MongoDB作为时间序列数据库的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将mongodb用于时间序列数据库,并且想知道是否有人可以建议如何最好地针对该场景进行设置.

I'm trying to use mongodb for a time series database and was wondering if anyone could suggest how best to set it up for that scenario.

时间序列数据与股价历史非常相似.我收集了来自不同机器的各种传感器的数据.有十亿个时间戳的值,我想问以下问题(最好是从数据库而不是应用程序级别):

The time series data is very similar to a stock price history. I have a collection of data from a variety of sensors taken from different machines. There are values at billion's of timestamps and I would like to ask the following questions (preferably from the database rather than the application level):

  1. 对于给定的一组传感器和时间间隔,我希望按时间顺序位于该间隔内的所有时间戳和传感器值.假设所有传感器共享相同的时间戳(它们都是在同一时间采样的).

  1. For a given set of sensors and time interval, I want all the timestamps and sensor values that lie within that interval in order by time. Assume all the sensors share the same timestamps (they were all sampled at the same time).

对于给定的一组传感器和时间间隔,我希望按时间顺序在给定间隔内的每个第k个项目(时间戳和相应的传感器值).

For a given set of sensors and time interval, I want every kth item (timestamp, and corresponding sensor values) that lie within the given interval in order by time.

关于如何最好地设置和实现查询的任何建议?

Any recommendation on how to best set this up and achieve the queries?

感谢您的建议.

推荐答案

如果您不需要永久保存数据(即您不介意数据老化"),则可以考虑使用上限集合".封顶的收藏有很多限制,这些限制反过来又提供了一些有趣的好处,听起来像它们很符合您的需求.

If you don't need to keep the data for ever (ie. you don't mind it 'ageing out') you may want to consider a 'capped collection'. Capped collections have a number of restrictions that in turn provide some interesting benefits which sound like they fit what you want quite well.

基本上,有上限的集合具有指定的大小,并且按照插入顺序将文档写入其中,直到其填满为止,此时它会环绕起来并开始用最新的文档覆盖最旧的文档.您对可以在上限集合中的文档上执行的更新有一点限制-即.您将无法执行会更改文档大小的更新(因为这意味着需要将其移动到磁盘上才能找到多余的空间).我看不出这是您所描述的问题.

Basically, a capped collection has a specified size, and documents are written to it in insertion order until it fills up, at which point it wraps around and begins overwriting the oldest documents with the newest. You are slightly limited in what updates you can perform on the documents in a capped collection - ie. you cannot perform an update that will change the size of the document (as this would mean it would need to be moved on disk to find the extra space). I can't see this being a problem for what you describe.

最终的结果是,您可以确保将上限集合中的数据按插入顺序写入磁盘并保留在磁盘上,这将使对插入顺序的查询非常快.

The upshot is that you are guaranteed that the data in your capped collection will be written to, and will stay on, disk in insertion order, which makes queries on insertion order very fast.

顺便说一句,传感器及其产生的数据有何不同?如果它们相对相似,我建议将它们全部存储在同一集合中以方便使用-否则将它们分开.

How different are the sensors and the data they produce, by the way? If they're relatively similar I would suggest storing them all in the same collection for ease of use - otherwise split them up.

假设您使用单个集合,那么您的两个查询听起来都非常可行.要记住的一件事是,要获得有上限的集合的好处,您将需要根据集合的自然"顺序进行查询,因此,按时间戳记键进行查询的速度不会那么快.如果以固定的时间间隔读取读数(因此您知道在给定的时间间隔内将读取多少个读数),我将为查询1建议类似以下内容:

Assuming you use a single collection, both your queries then sound very doable. One thing to bear in mind would be that to get the benefit of the capped collection you would need to be querying according to the collections 'natural' order, so querying by your timestamp key would not be as fast. If the readings are taken at regular intervals (so you know how many of them would be taken in a given time interval) I would suggest something like the following for query 1:

db.myCollection.find().limit(100000).sort({ $natural : -1 })

例如,假设您每秒存储100个读数,则上面的内容将返回最后100秒的数据.如果您希望前100秒,可以添加.skip(100000).

Assuming, for example, that you store 100 readings a second, the above will return the last 100 seconds worth of data. If you wanted the previous 100 seconds you could add .skip(100000).

对于您的第二个查询,听起来像您需要MapReduce,但这听起来并不困难.您可以通过与上述查询类似的查询来选择感兴趣的文档范围,然后使用map函数以您感兴趣的时间间隔仅选择那些文档.

For your second query, it sounds to me like you'll need MapReduce, but it doesn't sound particularly difficult. You can select the range of documents you're interested in with a similar query to the one above, then pick out only the ones at the intervals you're interested in with the map function.

这是有关上限集合的Mongo文档: http://www.mongodb.org/display/DOCS/Capped + Collections

Here's the Mongo Docs on capped collections: http://www.mongodb.org/display/DOCS/Capped+Collections

希望这会有所帮助!

这篇关于MongoDB作为时间序列数据库的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆