python pandas groupby() 结果 [英] python pandas groupby() result
问题描述
我有以下 python pandas 数据框:
I have the following python pandas data frame:
df = pd.DataFrame( {
'A': [1,1,1,1,2,2,2,3,3,4,4,4],
'B': [5,5,6,7,5,6,6,7,7,6,7,7],
'C': [1,1,1,1,1,1,1,1,1,1,1,1]
} );
df
A B C
0 1 5 1
1 1 5 1
2 1 6 1
3 1 7 1
4 2 5 1
5 2 6 1
6 2 6 1
7 3 7 1
8 3 7 1
9 4 6 1
10 4 7 1
11 4 7 1
我想要另一列存储固定(包括)A 和 B 的 C 值的总和值.也就是说,类似于:
I would like to have another column storing a value of a sum over C values for fixed (both) A and B. That is, something like:
A B C D
0 1 5 1 2
1 1 5 1 2
2 1 6 1 1
3 1 7 1 1
4 2 5 1 1
5 2 6 1 2
6 2 6 1 2
7 3 7 1 2
8 3 7 1 2
9 4 6 1 1
10 4 7 1 2
11 4 7 1 2
我已经尝试过使用熊猫 groupby
并且它有点工作:
I have tried with pandas groupby
and it kind of works:
res = {}
for a, group_by_A in df.groupby('A'):
group_by_B = group_by_A.groupby('B', as_index = False)
res[a] = group_by_B['C'].sum()
但我不知道如何以有序的方式将 res
的结果获取"到 df
中.对这方面的任何建议都会非常满意.谢谢你.
but I don't know how to 'get' the results from res
into df
in the orderly fashion. Would be very happy with any advice on this. Thank you.
推荐答案
这是一种方法(虽然感觉这应该与申请一起使用,但我无法理解).
Here's one way (though it feels this should work in one go with an apply, I can't get it).
In [11]: g = df.groupby(['A', 'B'])
In [12]: df1 = df.set_index(['A', 'B'])
size
groupby 函数就是你想要,我们必须将它与作为索引的 'A' 和 'B' 匹配:
The size
groupby function is the one you want, we have to match it to the 'A' and 'B' as the index:
In [13]: df1['D'] = g.size() # unfortunately this doesn't play nice with as_index=False
# Same would work with g['C'].sum()
In [14]: df1.reset_index()
Out[14]:
A B C D
0 1 5 1 2
1 1 5 1 2
2 1 6 1 1
3 1 7 1 1
4 2 5 1 1
5 2 6 1 2
6 2 6 1 2
7 3 7 1 2
8 3 7 1 2
9 4 6 1 1
10 4 7 1 2
11 4 7 1 2
这篇关于python pandas groupby() 结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!