pandas groupby,对行求和,然后将总和除以组中的行数 [英] Pandas groupby, sum rows, and divide sum by number of rows in group
问题描述
我有一个数据框:
>>> import pandas as pd
>>>
>>> df = pd.DataFrame({
... 'P': ['P1', 'P1', 'P2', 'P2', 'P2'],
... 'A1': [0,1,2,1,2],
... 'A2': [5,4,1,3,2],
... 'A3': [5,1,3,8,4],
... 'A4': [2,1,3,4,4],
... })
>>> df
P A1 A2 A3 A4
0 P1 0 5 5 2
1 P1 1 4 1 1
2 P2 2 1 3 3
3 P2 1 3 8 4
4 P2 2 2 4 4
>>>
对于每个P,我必须对A1-A4列求和.比将这个总和乘以P行的数量.例如,每个P中的行数为:
For each P I have to sum columns A1-A4. Than devide this sum by number of P rows. For example, number of rows in each P is:
>>> df.groupby('P').size()
P
P1 2
P2 3
dtype: int64
>>>
所有列的总和是:
>>> df.groupby('P').sum()
A1 A2 A3 A4
P
P1 1 9 6 3
P2 5 6 15 11
>>>
但是由于我需要按行求和,因此我将使用:
but as I need sum by rows, I will use:
>>> df.groupby('P').sum().sum(axis=1)
P
P1 19
P2 37
dtype: int64
>>>
现在,我必须将19/2(大小)和37/3相除,以获得所需的结果.为了做到这一点,我将准备像这样的数据:
Now I have to divide 19/2 (size) and 37/3 in order to get the results that I need. In order to do that, I would prepare the data like this:
>>> pd.concat([df.groupby('P').sum().sum(axis=1), df.groupby('P').size()], axis=1)
0 1
P
P1 19 2
P2 37 3
>>>
然后我可以使用apply来获得结果:
and than I can use apply in order to get the result:
>>> pd.concat([df.groupby('P').sum().sum(axis=1), df.groupby('P').size()], axis=1).apply(lambda row: row[0]/row[1], axis=1)
P
P1 9.500000
P2 12.333333
dtype: float64
>>>
它可以工作,但是我感到我的计算过于复杂,无法获得行数之和除以每个P的行数.
It works, but I have a feeling that I have overcomplicated calculation for getting sum of rows divided by number of rows for each P.
如果有人知道更好的方法,那么我会在毕业时听到它.我想至少摆脱concat.
If someone knows better approach I will be grad to hear it. I would like to get rid of at least concat.
推荐答案
这应该有效:
df.groupby('P').sum().sum(1) / df.groupby('P').size()
这篇关于 pandas groupby,对行求和,然后将总和除以组中的行数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!