将累积平均函数应用于分组对象 [英] Applying cumulative mean function to a grouped object
问题描述
我有一个DataFrame df
,其中每个记录代表一个足球比赛。小组将出现不止一次。我需要为每个团队的分数计算某种滚动平均值(呃,不完全是这封信的滚动平均值)。
I have a DataFrame df
where each record represents a soccer game. Teams will appear more than once. I need to compute some sort of a rolling mean for each team scores(well, not exactly the rolling mean to the letter).
date home away score_h score_a
166 2013-09-01 Fulham Chelsea 0 0
167 2013-09-03 Arsenal Everton 0 2
164 2013-09-05 Arsenal Swansea 5 1
165 2013-09-06 Fulham Norwich 0 1
163 2013-09-18 Arsenal Swansea 0 0
我需要计算的是每个球队(主场和客场)的平均分数。
What I need to calculate, is the mean score for each team (home and away).
为简洁起见,我们只需要做home栏:
For brevity, let's just do the home column:
grouped = df.groupby('home')
grouped = grouped.sort_index(by='date') # rows inside groups must be in asc order
结果如下:
date home away score_h score_a
home
Arsenal 167 2013-09-03 Arsenal Everton 0 2
164 2013-09-05 Arsenal Swansea 5 1
163 2013-09-18 Arsenal Swansea 0 0
Fulham 166 2013-09-01 Fulham Chelsea 0 0
165 2013-09-06 Fulham Norwich 0 1
<问题从这里开始
现在,我需要计算团队的滚动平均值。让我们手动为名为 Arsenal
的组执行操作。最后,我们应该结束两列,让我们称它们为: rmean_h
和 rmean_a
。组中的第一条记录( 167
)的得分为 0
和 2
。这些 rmean
分别是 0
和 2
。对于组中的第二条记录( 164
),rmeans将是(0 + 5)/ 2 = 2.5
并且(2 + 1)/ 2 = 1.5
,对于第三条记录,(0 + 5 + 0)/ 3 = 1.66
和(2 + 1 + 0)/ 3 = 1
。
Now, I need to compute "rolling mean" for teams. Let's do it by hand for the group named Arsenal
. At the end of this we should wind up with 2 extra columns, let's call them: rmean_h
and rmean_a
. First record in the group (167
) has scores of 0
and 2
. The rmean
of these is simply 0
and 2
respectively. For second record in the group (164
), the rmeans will be (0+5)/2 = 2.5
and (2+1) / 2 = 1.5
, and for the third record, (0+5+0)/3 = 1.66
and (2+1+0)/3 = 1
.
我们的DataFrame现在应该如下所示:
Our DataFrame should now looks like this:
date home away score_h score_a rmean_h rmean_a
home
Arsenal 167 2013-09-03 Arsenal Everton 0 2 0 2
164 2013-09-05 Arsenal Swansea 5 1 2.5 1.5
163 2013-09-18 Arsenal Swansea 0 0 1.66 1
Fulham 166 2013-09-01 Fulham Chelsea 0 0
165 2013-09-06 Fulham Norwich 0 1
我想为我的数据进行这些计算,请问您有什么建议吗?
I want to carry out these calculations for my data, do you have any suggestions please?
推荐答案
You can apply an expanding_mean
(see docs) to each group:
grouped = df.sort(columns='date').groupby('home')
grouped['score_h'].apply(pd.expanding_mean)
这篇关于将累积平均函数应用于分组对象的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!