按两列分组,并为第一级添加边距 [英] GroupBy two columns with margins for first level
问题描述
我将数据帧按2列分组,并按其他列的总和进行汇总.如何在同一数据框中按第一分组列求和?
I am grouping a dataframe by 2 columns and i aggregate by the sum of the other columns. How I can have a total by the first grouped column in the same data frame?
例如我的数据框是:
np.random.seed(0)
df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'foo'],
'B' : ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three'],
'C' : np.random.randn(8),
'D' : np.random.randn(8)})
结果:
grouped = df.groupby(by=['A', 'B']).sum()
是:
C D
A B
bar one 0.400157 0.410599
three 2.240893 1.454274
two -0.977278 0.121675
foo one 2.714141 0.340644
three -0.151357 0.333674
two 2.846296 0.905081
我得到什么:
C D
A B
bar one 0.400157 0.410599
two -0.977278 0.121675
three 2.240893 1.454274
total 1.663773 1.986547
foo one 2.714141 0.340644
two 2.846296 0.905081
three -0.151357 0.333674
total 5.409080 1.579400
如何做到?
更新:我在> pandas groupby和组的总和中发现了类似的问题该问题还有2个答案.
UPDATE: I found a similar question at Pandas groupby and sum total of group It has 2 more answer for this question.
推荐答案
您可以巧妙地使用pd.Categorical
在groupby
输出中为总计"创建占位符.这将使计算变得容易,并将总数分配回结果.
You can get clever with pd.Categorical
to create a placeholder for "total" in the groupby
output. This'll make it easy to compute and assign the total back to the result.
df.B = pd.Categorical(
df.B, categories=np.append(df.B.unique(), 'total'))
v = df.groupby(by=['A', 'B']).sum()
v.loc(axis=0)[pd.IndexSlice[:,'total']] = v.groupby(level=0).sum().values
print(v)
C D
A B
bar one 0.400157 0.410599
two -0.977278 0.121675
three 2.240893 1.454274
total 1.663773 1.986547
foo one 2.714141 0.340644
two 2.846296 0.905081
three -0.151357 0.333674
total 5.409080 1.579400
如果您需要汇总不同的指标:
If you need to aggregate on different metrics:
df.B = pd.Categorical(
df.B, categories=np.append(df.B.unique(), 'total'))
idx = pd.MultiIndex.from_product([df.A.unique(), df.B.cat.categories])
v = df.groupby(by=['A', 'B']).agg(['sum', 'count']).reindex(idx)
v.loc(axis=0)[pd.IndexSlice[:,'total']] = v.groupby(level=0, sort=False).sum().values
print(v)
C D
sum count sum count
foo one 2.714141 2.0 0.340644 2.0
two 2.846296 2.0 0.905081 2.0
three -0.151357 1.0 0.333674 1.0
total 5.409080 5.0 1.579400 5.0
bar one 0.400157 1.0 0.410599 1.0
two -0.977278 1.0 0.121675 1.0
three 2.240893 1.0 1.454274 1.0
total 1.663773 3.0 1.986547 3.0
另一种选择是pivot_table
,它使边距的生成更加容易(尽管不提供次级边距):
Another alternative is pivot_table
which makes margin generation easier (although does not provide sub-level margins):
df.pivot_table(index=['A', 'B'],
values=['C', 'D'],
aggfunc=['sum', 'count'],
margins=True)
sum count
C D C D
A B
bar one 0.400157 0.410599 1.0 1.0
two -0.977278 0.121675 1.0 1.0
three 2.240893 1.454274 1.0 1.0
foo one 2.714141 0.340644 2.0 2.0
two 2.846296 0.905081 2.0 2.0
three -0.151357 0.333674 1.0 1.0
All 7.072852 3.565947 8.0 8.0
这篇关于按两列分组,并为第一级添加边距的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!