pandas 数据帧:用线性插值重新采样 [英] Pandas data frame: resample with linear interpolation

查看:217
本文介绍了 pandas 数据帧:用线性插值重新采样的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正试图获得一个相当基本的重采样方法来处理大熊猫数据框。我的数据框架df由datetime条目索引并包含价格

 价格
datetime
2000-08- 16 09:29:55.755000 7.302786
2000-08-16 09:30:10.642000 7.304059
2000-08-16 09:30:26.598000 7.304435
2000-08-16 09:30: 41.372000 7.304314
2000-08-16 09:30:56.718000 7.304334

我想将其缩小到5分钟。使用

  df.resample(rule ='5Min',how ='last',closed ='left')
$ / pre>

在5分钟的倍数的数据中取最左边的点;类似

  df.resample(rule ='5Min',how ='first',closed ='left')

将右侧的关闭点。
但是,我想在左右点之间取线性插值,例如如果我的df包含两个连续的条目

 时间t1,价格p1 
时间t2,价格p2

  t1   

则重新采样的数据框应该具有条目

 时间t,价格p1 +(t-t1)/(t2-t1)*(p2-p1)


解决方案

尝试创建两个单独的数据框, reset_index 他们(因此它们具有相同的数字索引), fillna ,然后在df1和df2上进行数学运算。例如:

  df1 = df.resample(rule ='5Min',how ='last',closed ='left' .reset_index()。fillA(method ='ffill')
df2 = df.resample(rule ='5Min',how ='first',closed ='left' 'ffill')

dt = df1.datetime - df2.datetime
px_fld = df1.price + ...

这样的东西应该是诀窍。


I am trying to get a fairly basic resampling method to work with a pandas data frame. My data frame df is indexed by datetime entries and contains prices

                               price
datetime                            
2000-08-16 09:29:55.755000  7.302786
2000-08-16 09:30:10.642000  7.304059
2000-08-16 09:30:26.598000  7.304435
2000-08-16 09:30:41.372000  7.304314
2000-08-16 09:30:56.718000  7.304334

I would like to downsample this to 5min. Using

df.resample(rule='5Min',how='last',closed='left')

takes the closest point to the left in my data of a multiple of 5min; similarly

df.resample(rule='5Min',how='first',closed='left')

takes the closes point to the right. However, I would like to take the linear interpolation between the point to the left and right instead, e.g. if my df contains the two consecutive entries

time t1, price p1
time t2, price p2

and

t1<t<t2 where t is a multiple of 5min

then the resampled dataframe should have the entry

time t, price p1+(t-t1)/(t2-t1)*(p2-p1)

解决方案

try creating two separate dataframes, reset_index them (so they have the same numerical index), fillna on them, and then just do the math on df1 and df2. e.g:

df1 = df.resample(rule='5Min',how='last',closed='left').reset_index().fillna(method='ffill')
df2 = df.resample(rule='5Min',how='first',closed='left').reset_index().fillna(method='ffill')

dt = df1.datetime - df2.datetime
px_fld = df1.price + ...

something like that should do the trick.

这篇关于 pandas 数据帧:用线性插值重新采样的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆