在 pandas 的多索引数据帧上使用滚动功能 [英] using rolling functions on multi-index dataframe in pandas
问题描述
我在熊猫中有一个多索引数据框,其中索引是ID和时间戳.我希望能够计算每个ID的时间序列滚动总和,但是我似乎无法弄清楚如何在没有循环的情况下进行操作.
I have a multi-index dataframe in pandas, where index is on ID and timestamp. I want to be able to compute a time-series rolling sum of each ID but I can't seem to figure out how to do it without loops.
content = io.BytesIO("""\
IDs timestamp value
0 2010-10-30 1
0 2010-11-30 2
0 2011-11-30 3
1 2000-01-01 300
1 2007-01-01 33
1 2010-01-01 400
2 2000-01-01 11""")
df = pd.read_table(content, header=0, sep='\s+', parse_dates=[1])
df.set_index(['IDs', 'timestamp'], inplace=True)
pd.stats.moments.rolling_sum(df,window=2
其输出为:
value
IDs timestamp
0 2010-10-30 NaN
2010-11-30 3
2011-11-30 5
1 2000-01-01 303
2007-01-01 333
2010-01-01 433
2 2000-01-01 411
注意边缘的ID 0和1以及1和2之间的重叠(我不想要那样,使我的计算混乱了).解决此问题的一种可能方法是对ID使用groupby,然后遍历该groupby,然后应用rolling_sum.
Notice the overlap between IDs 0 and 1 and 1 and 2 at the edges (I don't want that, messes up my calculations). One possible way to get around this is to use groupby on IDs and then loop through that groupby and then apply a rolling_sum.
我确定有一个函数可以帮助我完成此任务而不使用循环.
I am sure there is a function to help me do this without using loops.
推荐答案
先分组,然后累加总和(顶级空间中的rolling_sum
也可用)
Group first, then roll the sum (also rolling_sum
is available in the top-level namespace)
In [18]: df.groupby(level='IDs').apply(lambda x: pd.rolling_sum(x,2))
Out[18]:
value
IDs timestamp
0 2010-10-30 NaN
2010-11-30 3
2011-11-30 5
1 2000-01-01 NaN
2007-01-01 333
2010-01-01 433
2 2000-01-01 NaN
这篇关于在 pandas 的多索引数据帧上使用滚动功能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!