使用Javascript和Mongodb重新采样时间序列数据 [英] Resample Time Series Data using Javascript and Mongodb
问题描述
时间序列数据的数据集需要从具有不规则时间间隔的数据集转换为常规时间序列,可能使用插值和重采样。
A data set of time series data needs to be turned from one with irregular time intervals to a regular time series, probably using interpolation and and resampling.
Python < a href =http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.resample.html =nofollow> pandas.Dataframe.resample
是一个可以执行此操作的功能。 Javascript可以这样做吗?时间序列数据集存储在Mongodb中。
Python's pandas.Dataframe.resample
is a function that will do this. Can Javascript do the same? The time series data set is stored in Mongodb.
推荐答案
这是有可能的。请记住,Pandas是一个为这类任务明确构建的库,也是一个野兽,而MongoDB则是一个数据库。但是如果忽略了使用插值的可能需要,以下内容很可能会满足您的需求:
It is kind of possible. Keep in mind that Pandas is a library built explicitly for those kind of tasks, and a beast at it, while MongoDB is meant to be a database. But chances are high that the following will suit your needs, if one ignores your probable need for using interpolation:
假设您将以下数据存储在名为MongoDB的集合中设备
Assuming that you have the following data stored in a MongoDB collection named devices
/* 0 */
{
"_id" : ObjectId("543fc08ccf1e8c06c0288802"),
"t" : ISODate("2014-10-20T14:56:44.097+02:00"),
"a" : "192.168.0.16",
"i" : 0,
"o" : 32
}
/* 1 */
{
"_id" : ObjectId("543fc08ccf1e8c06c0288803"),
"t" : ISODate("2014-10-20T14:56:59.107+02:00"),
"a" : "192.168.0.16",
"i" : 14243,
"o" : 8430
}
and so on...
在这种情况下,大约每15秒采样一次,但也可能是不规则的。如果您想将其重新采样到某一天的5分钟边界,那么您应该执行以下操作:
which, in this case, is sampled at around every 15 seconds, but it could as well be irregularly. If you want to resample it to a 5 minute boundary for a certain day, then you should do the following:
var low = ISODate("2014-10-23T00:00:00.000+02:00")
var high = ISODate("2014-10-24T00:00:00.000+02:00")
var interval = 5*60*1000;
db.devices.aggregate([
{$match: {t:{$gte: low, $lt: high}, a:"192.168.0.16"}},
{$group: {
_id:{
$subtract: ["$t", {
$mod: [{
$subtract: ["$t", low]
}, interval]
}]
},
total: {$sum: 1},
incoming: {$sum: "$i"},
outgoing: {$sum: "$o"},
}
},
{
$project: {
total: true,
incoming: true,
outgoing: true,
incoming_avg: {$divide: ["$incoming", "$total"]},
outgoing_avg: {$divide: ["$outgoing", "$total"]},
},
},
{$sort: {_id : 1}}
])
这将导致类似这样的事情
This will result in something like this
{
"result" : [
{
"_id" : ISODate("2014-10-23T07:25:00.000+02:00"),
"total" : 8,
"incoming" : 11039108,
"outgoing" : 404983,
"incoming_avg" : 1379888.5,
"outgoing_avg" : 50622.875
},
{
"_id" : ISODate("2014-10-23T07:30:00.000+02:00"),
"total" : 19,
"incoming" : 187241,
"outgoing" : 239912,
"incoming_avg" : 9854.78947368421,
"outgoing_avg" : 12626.94736842105
},
{
"_id" : ISODate("2014-10-23T07:35:00.000+02:00"),
"total" : 17,
"incoming" : 22420099,
"outgoing" : 1018766,
"incoming_avg" : 1318829.352941176,
"outgoing_avg" : 59927.41176470588
},
...
如果你想丢弃总收入,那么只需将该行留在$ project中阶段。 incoming_average只是如何计算平均值的一个例子,以防你的存储数据类似于rrdtool命名一个标尺(温度,cpu,传感器数据)。如果你只是在那个时间inverval聚合的总和之后,那就是传入和传出字段,那么你可以将整个$ project阶段排除在外。它仅用于计算时间间隔的平均值。
If you want to discard the total incoming, then just leave the line out in the $project stage. The incoming_average is just an example of how to compute the average, in case your stored data is something like what rrdtool names a gauge (temperature, cpu, sensor data). If you're only after the sum aggregated in that time inverval, that is the incoming and outgoing field, then you can leave the entire $project stage out. It is only there to compute the average of the time interval.
这篇关于使用Javascript和Mongodb重新采样时间序列数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!