时间序列数据的键值存储? [英] key-value store for time series data?

查看:116
本文介绍了时间序列数据的键值存储?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在使用SQL Server来存储几十万个对象的历史时间序列数据,每天观察到大约100次。我发现查询(给我所有的对象XYZ的时间t1和时间t2之间的值)太慢(对我的需要,慢是更多,然后一秒)。我通过时间戳和对象ID索引。

I've been using SQL Server to store historical time series data for a couple hundred thousand objects, observed about 100 times per day. I'm finding that queries (give me all values for object XYZ between time t1 and time t2) are too slow (for my needs, slow is more then a second). I'm indexing by timestamp and object ID.

我喜欢使用某个键值存储像MongoDB的想法,但我不知道如果这是一个适当的这种事情的使用,我找不到任何提到使用这样的数据库的时间序列数据。理想情况下,我将能够执行以下查询:

I've entertained the thought of using somethings a key-value store like MongoDB instead, but I'm not sure if this is an "appropriate" use of this sort of thing, and I couldn't find any mentions of using such a database for time series data. ideally, I'd be able to do the following queries:


  • 在时间t1和时间t2之间检索对象XYZ的所有数据

  • 执行上述操作,但每天返回一个日期点(第一个,最后一个,关闭到时间t ...)

  • 检索所有对象的所有数据对于特定时间戳

,应该对数据进行排序,理想情况下,应该能够快速写入新数据以及更新现有数据。

the data should be ordered, and ideally it should be fast to write new data as well as update existing data.

似乎我希望通过对象ID以及时间戳来查询,可能需要有两个数据库副本以不同的方式索引,以获得最佳性能...任何人有任何经验建立一个这样的系统,一个键值存储,或HDF5,或其他?

it seems like my desire to query by object ID as well as by timestamp might necessitate having two copies of the database indexed in different ways to get optimal performance...anyone have any experience building a system like this, with a key-value store, or HDF5, or something else? or is this totally doable in SQL Server and I'm just not doing it right?

推荐答案

这听起来像MongoDB会是一个非常好的适合。更新和插入的速度非常快,因此您可能需要为每个事件创建一个文档,例如:

It sounds like MongoDB would be a very good fit. Updates and inserts are super fast, so you might want to create a document for every event, such as:

{
   object: XYZ,
   ts : new Date()
}

你可以索引ts字段和查询也会快。 (顺便说一下,您可以在单个数据库上创建多个索引。)

Then you can index the ts field and queries will also be fast. (By the way, you can create multiple indexes on a single database.)

如何执行三个查询:


在时间t1和时间t2之间检索对象XYZ
的所有数据

retrieve all the data for object XYZ between time t1 and time t2



db.data.find({object : XYZ, ts : {$gt : t1, $lt : t2}})




执行上述操作,但每天返回一个日期
点(首先,最后,关闭到
时间t。 ..)

do the above, but return one date point per day (first, last, closed to time t...)



// first
db.data.find({object : XYZ, ts : {$gt : new Date(/* start of day */)}}).sort({ts : 1}).limit(1)
// last
db.data.find({object : XYZ, ts : {$lt : new Date(/* end of day */)}}).sort({ts : -1}).limit(1)

对于最接近的时间,你可能需要一个自定义JavaScript函数,但它是可行的。

For closest to some time, you'd probably need a custom JavaScript function, but it's doable.


检索
a所有对象的所有数据特定时间戳

retrieve all data for all objects for a particular timestamp



db.data.find({ts : timestamp})

如果您有任何问题,可以随时询问用户列表,其他人可能能够想到一个更容易的方式来获得最接近的时间事件。

Feel free to ask on the user list if you have any questions, someone else might be able to think of an easier way of getting closest-to-a-time events.

这篇关于时间序列数据的键值存储?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆