MongoDB 作为时间序列数据库 [英] MongoDB as a Time Series Database

查看:17
本文介绍了MongoDB 作为时间序列数据库的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将 mongodb 用于时间序列数据库,并且想知道是否有人可以建议如何最好地针对该场景进行设置.

I'm trying to use mongodb for a time series database and was wondering if anyone could suggest how best to set it up for that scenario.

时间序列数据与股票价格历史非常相似.我收集了来自不同机器的各种传感器的数据.有数十亿个时间戳的值,我想问以下问题(最好来自数据库而不是应用程序级别):

The time series data is very similar to a stock price history. I have a collection of data from a variety of sensors taken from different machines. There are values at billion's of timestamps and I would like to ask the following questions (preferably from the database rather than the application level):

  1. 对于给定的一组传感器和时间间隔,我想要按时间顺序位于该间隔内的所有时间戳和传感器值.假设所有传感器共享相同的时间戳(它们都是同时采样的).

  1. For a given set of sensors and time interval, I want all the timestamps and sensor values that lie within that interval in order by time. Assume all the sensors share the same timestamps (they were all sampled at the same time).

对于给定的一组传感器和时间间隔,我希望按时间顺序位于给定间隔内的每 k 个项目(时间戳和相应的传感器值).

For a given set of sensors and time interval, I want every kth item (timestamp, and corresponding sensor values) that lie within the given interval in order by time.

关于如何最好地设置和实现查询的任何建议?

Any recommendation on how to best set this up and achieve the queries?

感谢您的建议.

推荐答案

如果您不需要永远保留数据(即,您不介意它老化"),您可能需要考虑一个 '封顶集合'.有上限的集合有许多限制,这些限制反过来又提供了一些有趣的好处,听起来它们非常符合您的需求.

If you don't need to keep the data for ever (ie. you don't mind it 'ageing out') you may want to consider a 'capped collection'. Capped collections have a number of restrictions that in turn provide some interesting benefits which sound like they fit what you want quite well.

基本上,有上限的集合具有指定的大小,文档按插入顺序写入其中,直到填满为止,此时它会环绕并开始用最新的文档覆盖最旧的文档.您可以对上限集合中的文档执行的更新略有限制 - 即.您无法执行会更改文档大小的更新(因为这意味着需要将其移动到磁盘上才能找到额外的空间).根据您的描述,我看不出这是个问题.

Basically, a capped collection has a specified size, and documents are written to it in insertion order until it fills up, at which point it wraps around and begins overwriting the oldest documents with the newest. You are slightly limited in what updates you can perform on the documents in a capped collection - ie. you cannot perform an update that will change the size of the document (as this would mean it would need to be moved on disk to find the extra space). I can't see this being a problem for what you describe.

结果是您可以保证您的上限集合中的数据将按插入顺序写入并保留在磁盘上,这使得插入顺序查询非常快.

The upshot is that you are guaranteed that the data in your capped collection will be written to, and will stay on, disk in insertion order, which makes queries on insertion order very fast.

顺便问一下,传感器及其产生的数据有何不同?如果它们相对相似,我建议将它们全部存储在同一个集合中以方便使用 - 否则将它们分开.

How different are the sensors and the data they produce, by the way? If they're relatively similar I would suggest storing them all in the same collection for ease of use - otherwise split them up.

假设您使用单个集合,那么您的两个查询听起来都非常可行.要记住的一件事是,为了获得封顶集合的好处,您需要根据集合的自然"顺序进行查询,因此通过时间戳键进行查询不会那么快.如果定期读取读数(因此您知道在给定的时间间隔内将读取多少读数),我建议查询 1 如下所示:

Assuming you use a single collection, both your queries then sound very doable. One thing to bear in mind would be that to get the benefit of the capped collection you would need to be querying according to the collections 'natural' order, so querying by your timestamp key would not be as fast. If the readings are taken at regular intervals (so you know how many of them would be taken in a given time interval) I would suggest something like the following for query 1:

db.myCollection.find().limit(100000).sort({ $natural : -1 })

例如,假设您每秒存储 100 个读数,以上将返回最后 100 秒的数据.如果你想要前 100 秒,你可以添加 .skip(100000).

Assuming, for example, that you store 100 readings a second, the above will return the last 100 seconds worth of data. If you wanted the previous 100 seconds you could add .skip(100000).

对于您的第二个查询,在我看来您需要 MapReduce,但听起来并不是特别困难.您可以使用与上述类似的查询来选择您感兴趣的文档范围,然后使用 map 函数仅在您感兴趣的时间间隔中挑选出那些.

For your second query, it sounds to me like you'll need MapReduce, but it doesn't sound particularly difficult. You can select the range of documents you're interested in with a similar query to the one above, then pick out only the ones at the intervals you're interested in with the map function.

这里是关于上限集合的 Mongo 文档:http://www.mongodb.org/显示/DOCS/Capped+Collections

Here's the Mongo Docs on capped collections: http://www.mongodb.org/display/DOCS/Capped+Collections

希望这会有所帮助!

这篇关于MongoDB 作为时间序列数据库的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆