使用 pandas 在滚动窗口中重新采样 [英] Resample in a rolling window using pandas

查看:36
本文介绍了使用 pandas 在滚动窗口中重新采样的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有每日数据(不规则排列),我想每个月计算过去的移动标准偏差(或任意非线性函数) 5个月.例如,对于2012年5月,我将计算从2012年1月到2012年5月(5个月)的时间段的stddev.对于2012年6月,该期间从2012年2月开始,依此类推.最终结果是一个包含月度值的时间序列.

Assume I have daily data (not regularly spaced), I want to compute for each month the moving standard deviation (or an arbitrarily non linear function) in the past 5 months. For example, for May 2012 I would compute the stddev from the period starting from Jan 2012 to May 2012 (5 months). For June 2012 the period starts in Feb 2012, etc. The final result is a time series with monthly values.

无法应用滚动窗口,因为这首先是每天的操作,其次我需要指定值的数量(滚动窗口不按时间范围汇总,一些帖子解决了此问题问题,但它们与我的问题无关因为滚动的日子仍然会是每一天).

I cannot apply a rolling window because this would first be daily and secondly I need to specify the number of values (a rolling window does not aggregate by time frame, some posts addressed this issue but they are not relevant to my problem as the rolling would still be for each new day).

无法应用重采样,因为那样的话,采样将是每5个月一次,例如,我将只具有2012年5月,2012年10月,2013年3月的值...最后,函数是非线性的,我无法通过首先每月进行采样然后在其上应用5个周期的滚动窗口来重建它.

I cannot apply resampling, because then the sample would be every 5 months, e..g I would only have values for May 2012, Oct 2012, March 2013... Finally, as the function is not linear I cannot reconstruct it by first doing a monthly sample and then applying a 5 period rolling window on it.

因此,我需要将某种重采样功能应用于按时间间隔(而不是值的数量)定义的滚动窗口.

So I would need a sort of resampling functionality applied to a rolling window defined by time interval (not number of values).

如何在熊猫中做到这一点?一种方法可能是将几个(在此示例中为5个)重新采样(5个月)的时间序列组合在一起,每个时间序列具有一个月的偏移量,然后将所有这些序列对齐为一个序列...但是我不知道如何实现. >

How can I do this in pandas? One approach could be to combine several (5 in this example) resampled (5 months) time series, each with one month of offset and then align all these series into one... but I do not know how to implement this.

推荐答案

在处理timedelta序列时,我遇到了类似的问题,我想获取移动平均值,然后重新采样.这是一个我有100秒数据的示例.我采用10秒窗口的滚动平均值,然后每5秒重新采样一次,在每个重新采样箱中获取第一个条目.结果应为前10秒的平均值,以5秒为增量.您可以使用月格式而不是秒来做类似的事情:

I had a similar issue dealing with a timedelta series where I wanted to take a moving average and then resample. Here is an example where I have 100 seconds of data. I take a rolling average of 10 second windows and then resample for every 5 seconds, taking the first entry in each resample bin. The result should be the previous 10 second average at 5 second increments. You could do something similar with month format instead of seconds:

df = pd.DataFrame(range(0,100), index=pd.TimedeltaIndex(range(0,100),'s'))
df.rolling('10s').mean().resample('5s').first()

结果:

             0
00:00:00   0.0
00:00:05   2.5
00:00:10   5.5
00:00:15  10.5
00:00:20  15.5
00:00:25  20.5
00:00:30  25.5
00:00:35  30.5
00:00:40  35.5
00:00:45  40.5
00:00:50  45.5
00:00:55  50.5
00:01:00  55.5
00:01:05  60.5
00:01:10  65.5
00:01:15  70.5
00:01:20  75.5
00:01:25  80.5
00:01:30  85.5
00:01:35  90.5

这篇关于使用 pandas 在滚动窗口中重新采样的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆