将不规则的时间戳测量转换为等距的时间加权平均值 [英] Converting irregularly time stamped measurements into equally spaced, time-weighted averages
问题描述
我有一系列带有时间戳和不规则间隔的测量.这些系列中的值始终代表测量值的变化-即不更改就没有新值.这样的系列的一个简单例子是:
I have series of measurements which are time stamped and irregularly spaced. Values in these series always represent changes of the measurement -- i.e. without a change no new value. A simple example of such a series would be:
23:00:00.100 10
23:00:01.200 8
23:00:01.600 0
23:00:06.300 4
我想要达到的是时间间隔平均值的等距分布.对于给定的示例,我的目标可能是基于秒的频率,因此结果如下:
What I want to reach is an equally spaced series of time-weighted averages. For the given example I might aim at a frequency based on seconds and hence a result like the following:
23:00:01 NaN ( the first 100ms are missing )
23:00:02 5.2 ( 10*0.2 + 8*0.4 + 0*0.4 )
23:00:03 0
23:00:04 0
23:00:05 0
23:00:06 2.8 ( 0*0.3 + 4*0.7 )
我正在寻找一个解决该问题的Python库.对我来说,这似乎是一个标准问题,但是到目前为止,在像pandas这样的标准库中我找不到这种功能.
I am searching for a Python library solving that problem. For me, this seems to be a standard problem, but I couldn't find such a functionality so far in standard libraries like pandas.
该算法需要考虑两点:
- 时间加权平均
- 在形成平均值时考虑当前间隔之前的值(甚至可能领先于领先者)
data.resample('S', fill_method='pad') # forming a series of seconds
完成部分工作.提供用户定义的汇总功能将允许形成时间加权平均值 a>,但是因为间隔的开始被忽略,所以该平均值也将是不正确的.更糟糕的是:系列中的孔被平均值填充,在示例中从上至秒3、4和5的值不为零.
does parts of the work. Providing a user-defined function for aggregation will allow to form time-weighted averages, but because the beginning of the interval is ignored, this average will be incorrect too. Even worse: the holes in the series are filled with the average values, leading in the example from above to the values of seconds 3, 4 and 5 to be non zero.
data = data.resample('L', fill_method='pad') # forming a series of milliseconds
data.resample('S')
具有一定的准确性,但是根据准确性而定-非常昂贵.就我而言,太贵了.
does the trick with a certain accurateness, but is -- depending on the accurateness -- very expensive. In my case, too expensive.
import pandas as pa
import numpy as np
from datetime import datetime
from datetime import timedelta
time_stamps=[datetime(2013,04,11,23,00,00,100000),
datetime(2013,04,11,23,00,1,200000),
datetime(2013,04,11,23,00,1,600000),
datetime(2013,04,11,23,00,6,300000)]
values = [10, 8, 0, 4]
raw = pa.TimeSeries(index=time_stamps, data=values)
def round_down_to_second(dt):
return datetime(year=dt.year, month=dt.month, day=dt.day,
hour=dt.hour, minute=dt.minute, second=dt.second)
def round_up_to_second(dt):
return round_down_to_second(dt) + timedelta(seconds=1)
def time_weighted_average(data):
end = pa.DatetimeIndex([round_up_to_second(data.index[-1])])
return np.average(data, weights=np.diff(data.index.append(end).asi8))
start = round_down_to_second(time_stamps[0])
end = round_down_to_second(time_stamps[-1])
range = pa.date_range(start, end, freq='S')
data = raw.reindex(raw.index + range)
data = data.ffill()
data = data.resample('S', how=time_weighted_average)
推荐答案
您可以使用跟踪来完成此操作.
You can do this with traces.
from datetime import datetime
import traces
ts = traces.TimeSeries(data=[
(datetime(2016, 9, 27, 23, 0, 0, 100000), 10),
(datetime(2016, 9, 27, 23, 0, 1, 200000), 8),
(datetime(2016, 9, 27, 23, 0, 1, 600000), 0),
(datetime(2016, 9, 27, 23, 0, 6, 300000), 4),
])
regularized = ts.moving_average(
start=datetime(2016, 9, 27, 23, 0, 1),
sampling_period=1,
placement='left',
)
结果为:
[(datetime(2016, 9, 27, 23, 0, 1), 5.2),
(datetime(2016, 9, 27, 23, 0, 2), 0.0),
(datetime(2016, 9, 27, 23, 0, 3), 0.0),
(datetime(2016, 9, 27, 23, 0, 4), 0.0),
(datetime(2016, 9, 27, 23, 0, 5), 0.0),
(datetime(2016, 9, 27, 23, 0, 6), 2.8)]
这篇关于将不规则的时间戳测量转换为等距的时间加权平均值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!