Python使用线性插值对不规则时间序列进行正则化 [英] Python regularise irregular time series with linear interpolation

查看:294
本文介绍了Python使用线性插值对不规则时间序列进行正则化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在熊猫中有一个时间序列,如下所示:

I have a time series in pandas that looks like this:

                     Values
1992-08-27 07:46:48    28.0  
1992-08-27 08:00:48    28.2  
1992-08-27 08:33:48    28.4  
1992-08-27 08:43:48    28.8  
1992-08-27 08:48:48    29.0  
1992-08-27 08:51:48    29.2  
1992-08-27 08:53:48    29.6  
1992-08-27 08:56:48    29.8  
1992-08-27 09:03:48    30.0

我想将其重新采样为具有15分钟时间步长的常规时间序列,在该时间序列中线性插值.基本上我想得到:

I would like to resample it to a regular time series with 15 min times steps where the values are linearly interpolated. Basically I would like to get:

                     Values
1992-08-27 08:00:00    28.2  
1992-08-27 08:15:00    28.3  
1992-08-27 08:30:00    28.4  
1992-08-27 08:45:00    28.8  
1992-08-27 09:00:00    29.9

但是使用熊猫的重采样方法(df.resample('15Min'))我会得到:

However using the resample method (df.resample('15Min')) from Pandas I get:

                     Values
1992-08-27 08:00:00   28.20  
1992-08-27 08:15:00     NaN  
1992-08-27 08:30:00   28.60  
1992-08-27 08:45:00   29.40  
1992-08-27 09:00:00   30.00  

我尝试使用不同的'how'和'fill_method'参数进行重采样,但从未获得我想要的结果.我使用了错误的方法吗?

I have tried the resample method with different 'how' and 'fill_method' parameters but never got exactly the results I wanted. Am I using the wrong method?

我认为这是一个相当简单的查询,但是我在网上搜索了一段时间,却找不到答案.

I figure this is a fairly simple query, but I have searched the web for a while and couldn't find an answer.

在此先感谢您能获得的任何帮助.

Thanks in advance for any help I can get.

推荐答案

需要一些工作,但请尝试一下.基本思想是找到与每个重采样点最近的两个时间戳并进行插值. np.searchsorted用于查找最接近重采样点的日期.

It takes a bit of work, but try this out. Basic idea is find the closest two timestamps to each resample point and interpolate. np.searchsorted is used to find dates closest to the resample point.

# empty frame with desired index
rs = pd.DataFrame(index=df.resample('15min').iloc[1:].index)

# array of indexes corresponding with closest timestamp after resample
idx_after = np.searchsorted(df.index.values, rs.index.values)

# values and timestamp before/after resample
rs['after'] = df.loc[df.index[idx_after], 'Values'].values
rs['before'] = df.loc[df.index[idx_after - 1], 'Values'].values
rs['after_time'] = df.index[idx_after]
rs['before_time'] = df.index[idx_after - 1]

#calculate new weighted value
rs['span'] = (rs['after_time'] - rs['before_time'])
rs['after_weight'] = (rs['after_time'] - rs.index) / rs['span']
# I got errors here unless I turn the index to a series
rs['before_weight'] = (pd.Series(data=rs.index, index=rs.index) - rs['before_time']) / rs['span']

rs['Values'] = rs.eval('before * before_weight + after * after_weight')

毕竟,希望答案是正确的

After all that, hopefully the right answer:

In [161]: rs['Values']
Out[161]: 
1992-08-27 08:00:00    28.011429
1992-08-27 08:15:00    28.313939
1992-08-27 08:30:00    28.223030
1992-08-27 08:45:00    28.952000
1992-08-27 09:00:00    29.908571
Freq: 15T, Name: Values, dtype: float64

这篇关于Python使用线性插值对不规则时间序列进行正则化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆