pandas 时间序列重新采样和内插 [英] Pandas timeseries resampling and interpolating together

查看:52
本文介绍了 pandas 时间序列重新采样和内插的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有时间戳记的传感器数据.由于技术细节的原因,我每隔大约一分钟的时间从传感器获取数据.数据可能如下所示:

I have timestamped sensor data. Because of technical details, I get data from the sensors at approximately one minute intervals. The data may look like this:

   tstamp               val
0  2016-09-01 00:00:00  57
1  2016-09-01 00:01:00  57
2  2016-09-01 00:02:23  57
3  2016-09-01 00:03:04  57
4  2016-09-01 00:03:58  58
5  2016-09-01 00:05:00  60

现在,从本质上讲,如果我能在确切的时刻获得所有数据,我将非常高兴,但我没有.保持分布并每分钟都有数据的唯一方法是插值.例如,在行索引1和2之间有83秒,而在精确的分钟获取值的自然方法是在两行数据之间进行插值(在本例中为57,但是事实并非如此)到处都是.)

Now, essentially, I would be extremely happy if I got all data at the exact minute, but I don't. The only way to conserve the distribution and have data at each minute is to interpolate. For example, between row indexes 1 and 2 there are 83 seconds, and the natural way to get a value at the exact minute is to interpolate between the two rows of data (in this case, it is 57, but that is not the case everywhere).

现在,我的方法是执行以下操作:

Right now, my approach is to do the following:

date = pd.to_datetime(df['measurement_tstamp'].iloc[0].date())
ts_d = df['measurement_tstamp'].dt.hour * 60 * 60 +\
       df['measurement_tstamp'].dt.minute * 60 +\
       df['measurement_tstamp'].dt.second
ts_r = np.arange(0, 24*60*60, 60)
data = scipy.interpolate.interp1d(x=ts_d, y=df['speed'].values)(ts_r)
req = pd.Series(data, index=pd.to_timedelta(ts_r, unit='s'))
req.index = date + req.index

但这感觉很引人注意,而且对我来说很长.有很多出色的熊猫方法可以进行重采样,四舍五入等.我整天都在阅读它们,但事实证明,插值并没有按照我想要的方式进行. resample的作用与groupby相同,并且对落在一起的时间点进行平均. fillna进行插值,但resample已经通过求平均值更改数据后才进行插值.

But this feels rather drawn out and long to me. There are excellent pandas methods that do resampling, rounding, etc. I have been reading them all day, but it turns out that nothing does interpolation just the way I want it. resample works like a groupby and averages time points that fall together. fillna does interpolation, but not after resample has already altered the data by averaging.

我错过了什么吗?还是我最好的方法?

Am I missing something, or is my approach the best there is?

为简单起见,假设我按天和传感器对数据进行分组,因此一次仅插补一个传感器的24小时周期.

For simplicity, assume that I group the data by day, and by sensor, so only a 24 hour period from one sensor is interpolated at a time.

推荐答案

d = df.set_index('tstamp')
t = d.index
r = pd.date_range(t.min().date(), periods=24*60, freq='T')

d.reindex(t.union(r)).interpolate('index').ix[r]

请注意,periods=24*60处理的是每日数据,而不是问题中提供的样本.对于该示例,periods=6将起作用.

Note, periods=24*60 works on daily data, not on the sample provided in the question. For that sample, periods=6 will work.

这篇关于 pandas 时间序列重新采样和内插的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆