分组汇总 [英] rolling sum by group
本文介绍了分组汇总的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
考虑这个简单的例子
df = pd.DataFrame({'date' : [pd.to_datetime('2018-01-01'),
pd.to_datetime('2018-01-01'),
pd.to_datetime('2018-01-01'),
pd.to_datetime('2018-01-01')],
'group' : ['a','a','b','b'],
'value' : [1,2,3,4],
'value_useless' : [2,2,2,2]})
df
Out[78]:
date group value value_useless
0 2018-01-01 a 1 2
1 2018-01-01 a 2 2
2 2018-01-01 b 3 2
3 2018-01-01 b 4 2
在这里,我想按组计算value
的滚动总和.我尝试简单的
Here I want to compute the rolling sum of value
by group. I try the simple
df['rolling_sum'] = df.groupby('group').value.rolling(2).sum()
TypeError: incompatible index of inserted column with frame index
带有apply
的变体似乎也不起作用
A variant with apply
does not seem to work either
df['rolling_sum'] = df.groupby('group').apply(lambda x: x.value.rolling(2).sum())
TypeError: incompatible index of inserted column with frame index
我在这里想念什么?谢谢!
What am I missing here? thanks!
推荐答案
groupby
正在添加妨碍您前进的索引级别.
The groupby
is adding an index level that is getting in your way.
rs = df.groupby('group').value.rolling(2).sum()
df.assign(rolling_sum=rs.reset_index(level=0, drop=True))
date group value value_useless rolling_sum
0 2018-01-01 a 1 2 NaN
1 2018-01-01 a 2 2 3.0
2 2018-01-01 b 3 2 NaN
3 2018-01-01 b 4 2 7.0
详细信息
rs
# Annoying Index Level
# |
# v
# group
# a 0 NaN
# 1 3.0
# b 2 NaN
# 3 7.0
# Name: value, dtype: float64
或者,您可以使用pd.concat
df.assign(rolling_sum=pd.concat(s.rolling(2).sum() for _, s in df.groupby('group').value))
date group value value_useless rolling_sum
0 2018-01-01 a 1 2 NaN
1 2018-01-01 a 2 2 3.0
2 2018-01-01 b 3 2 NaN
3 2018-01-01 b 4 2 7.0
这篇关于分组汇总的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文