从传感器汇总时间序列 [英] Aggregating timeseries from sensors

查看:151
本文介绍了从传感器汇总时间序列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有大约500个传感器,每个传感器每分钟发出一次值.可以假设传感器的值保持恒定,直到发出下一个值,从而创建一个时间序列.传感器在发出数据时并没有同步(因此观察时间戳会有所不同),但是它们全部集中收集并存储在每个传感器中(以允许按传感器子集进行过滤).

如何生成一个汇总的时间序列,以给出来自传感器的数据总和? ñ (需要创建超过1天的观测值的时间序列-因此需要考虑每天24x60x500的观测值).计算还需要快速,最好以< 1秒

示例-原始输入:

q)n:10
q)tbl:([]time:n?.z.t;sensor:n?3;val:n?100.0)
q)select from tbl
time         sensor val
----------------------------    
01:43:58.525 0      33.32978
04:35:12.181 0      78.75249
04:35:31.388 0      1.898088    
02:31:11.594 1      16.63539
07:16:40.320 1      52.34027
00:49:55.557 2      45.47007
01:18:57.918 2      42.46532
02:37:14.070 2      91.98683
03:48:43.055 2      41.855
06:34:32.414 2      9.840246

我正在寻找的输出应该显示相同的时间戳,以及传感器之间的总和.如果传感器没有在匹配的时间戳记中定义记录,则应使用其先前的值(记录仅表示传感器输出更改的时间).

预期的输出,按时间排序

time         aggregatedvalue
----------------------------    
00:49:55.557   45.47007  / 0 (sensor 0) + 0 (sensor 1) + 45.47007 (sensor 2)
01:18:57.918   42.46532  / 0 (sensor 0) + 0 (sensor 1) + 42.46532 (new value on sensor 2)
01:43:58.525   75.7951   / 33.32978 + 0 + 42.46532
02:31:11.594   92.43049  / 33.32978 + 16.63539 + 42.46532
02:37:14.070   141.952   / 33.32978 + 16.63539 + 91.98683
03:48:43.055   91.82017  / 33.32978 + 16.63539 + 41.855
04:35:12.181   137.24288 / 78.75249 + 16.63539 + 41.855
04:35:31.388   60.388478 / 1.898088 + 16.63539 + 41.855   
06:34:32.414   28.373724 / 1.898088 + 16.63539 + 9.840246 
07:16:40.320   64.078604 / 1.898088 + 52.34027 + 9.840246

解决方案

我假设记录按时间顺序排列,因此tbl将按时间排序.如果不是这种情况,请先按时间对表格进行排序.

d是每次传感器最新价格的字典.下面的解决方案可能不是最高级的,我可以想象有一个性能更高的方法可以使用,而无需每个方法.

q)d:(`long$())!`float$()
q)f:{d[x]::y;sum d} 
q)update agg:f'[sensor;val] from tbl
time         sensor val      agg     
-------------------------------------
00:34:28.887 2      53.47096 53.47096
01:05:42.696 2      40.66642 40.66642
01:26:21.548 1      41.1597  81.82612
01:53:10.321 1      51.70911 92.37553
03:42:39.320 1      17.80839 58.47481
05:15:26.418 2      51.59796 69.40635
05:47:49.777 0      30.17723 99.58358
11:32:19.305 0      39.27524 108.6816
11:37:56.091 0      71.11716 140.5235
12:09:18.458 1      78.5033  201.2184

您的72万条记录的数据集将相对较小,因此任何聚合都应该在一秒钟之内.如果您存储许多天的数据,则可能需要考虑概述的某些技术(显示,分区等) 解决方案

I'm assuming the records are coming in in time order, therefore tbl will be sorted by time. If this is not the case, sort the table by time first.

d is a dictionary of last price by sensor at each time. The solution below is probably not the most elegent and I can imagine a more performant method is available that would not require the each.

q)d:(`long$())!`float$()
q)f:{d[x]::y;sum d} 
q)update agg:f'[sensor;val] from tbl
time         sensor val      agg     
-------------------------------------
00:34:28.887 2      53.47096 53.47096
01:05:42.696 2      40.66642 40.66642
01:26:21.548 1      41.1597  81.82612
01:53:10.321 1      51.70911 92.37553
03:42:39.320 1      17.80839 58.47481
05:15:26.418 2      51.59796 69.40635
05:47:49.777 0      30.17723 99.58358
11:32:19.305 0      39.27524 108.6816
11:37:56.091 0      71.11716 140.5235
12:09:18.458 1      78.5033  201.2184

Your data set of 720k records will be relatively small, so any aggregations should be well under a second. If you storing many days of data you may want to consider some of the techniques (splaying, partitioning etc) outlined here .

这篇关于从传感器汇总时间序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆