pandas 每月滚动窗口 [英] Pandas monthly rolling window

查看:91
本文介绍了 pandas 每月滚动窗口的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我希望对按类别分组的每日数据做一个每月"滚动窗口.下面的代码不能按原样工作,它会导致以下错误:

I am looking to do a 'monthly' rolling window on daily data grouped by a category. The code below does not work as is, it leads to the following error:

ValueError: <DateOffset: months=1> is a non-fixed frequency

我知道我可以使用'30D'偏移量,但这会随着时间的推移而改变日期.

I know that I could use '30D' offset, however this would shift the date over time.

我正在寻找一个窗口的总和,该窗口的范围从一个月的第x天到第J个月的相同的第x天.例如. J = 1:7月4日至8月4日,7月5日至8月5日,7月6日至8月6日等

I'm looking for the sum of a window that spans from the x-th day of a month to that same x-th day of the J-th month. E.g. with J=1: 4th of July to 4th of August, 5th of July to 5th of August, 6th of July to 6th of August etc

几天来我一直在努力解决这个问题.任何建议或小费将非常.新年快乐.

I've been trying to figure this out for a few days now. Any suggestions or tipps would be very appreciated. Happy New Year.

MRE:

import pandas as pd
from io import StringIO

data = StringIO(
"""\
date          logret       category
2014-03-25    -0.01           A
2014-04-05    -0.02           A
2014-04-15    -0.03           A
2014-04-25    0.01            B
2014-05-05    0.03            B
2014-05-15    -0.01           A
2014-05-25    0.04            B
"""
)

df = pd.read_csv(data,sep="\s+",parse_dates=True,index_col="date")

J=1

df.groupby(['category'])['logret'].rolling(pd.DateOffset(months=J),min_periods=J*20).sum() 

推荐答案

在中间步骤中,将您的时间戳标准化",以使每个月有31天,然后进行汇总,最后从结果中删除插入"行.

In an intermediary step 'normalize' your timestamps, such that every month has 31 days, then aggregate, and finally drop the 'inserted' rows from your result.

只要您的聚合中性元素是有效的.

That works as long as your aggregation has a neutral element.

  1. 使用所有时间戳记作为字符串从原始df创建索引
  2. 使用表示时间戳的字符串创建另一个索引,其中所有月份都为31天
  3. 合并,汇总等
  4. 通过从原始df导出的索引
  5. 从聚合中选择
  6. 将新列添加到原始df
  1. create an index from the original df with all timestamps as strings
  2. create another index with strings representing timestamps where all months have 31 days
  3. merge, aggregate, etc.
  4. select from aggregation by the index derived from original df
  5. add new column to original df

import pandas as pd
from io import StringIO

data = StringIO(
"""\
date          logret       category
2014-03-25    -0.01           A
2014-04-05    -0.02           A
2014-04-15    -0.03           A
2014-04-25    0.01            B
2014-05-05    0.03            B
2014-05-15    -0.01           A
2014-05-25    0.04            B
"""
)

df = pd.read_csv(data,sep="\s+",parse_dates=True,index_col="date")
idx = df.index.strftime('%Y-%m-%d')

y0 = df.index[0].year
y1 = df.index[-1].year

padded = pd.DataFrame(index=[f'{y}-{m:02}-{d:02}' 
                             for y in range(y0,y1+1) 
                             for m in range(1, 13)
                             for d in range(1, 32)])[idx[0]:idx[-1]]

# Note that the rolling interval is exclusive at start
df.assign(rolling_aggregate=padded.join(df.set_index(idx)).fillna(0).rolling(31).agg(sum).loc[idx])

产量:

            logret category  rolling_aggregate
date                                          
2014-03-25   -0.01        A                NaN
2014-04-05   -0.02        A                NaN
2014-04-15   -0.03        A                NaN
2014-04-25    0.01        B              -0.04
2014-05-05    0.03        B               0.01
2014-05-15   -0.01        A               0.03
2014-05-25    0.04        B               0.06

这篇关于 pandas 每月滚动窗口的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆