按两列分组,并为第一级添加边距 [英] GroupBy two columns with margins for first level

查看:95
本文介绍了按两列分组,并为第一级添加边距的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我将数据帧按2列分组,并按其他列的总和进行汇总.如何在同一数据框中按第一分组列求和?

I am grouping a dataframe by 2 columns and i aggregate by the sum of the other columns. How I can have a total by the first grouped column in the same data frame?

例如我的数据框是:

np.random.seed(0)
df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'foo'],
               'B' : ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three'],
               'C' : np.random.randn(8),
               'D' : np.random.randn(8)})

结果:

grouped = df.groupby(by=['A', 'B']).sum()

是:

                  C         D
A   B                        
bar one    0.400157  0.410599
    three  2.240893  1.454274
    two   -0.977278  0.121675
foo one    2.714141  0.340644
    three -0.151357  0.333674
    two    2.846296  0.905081

我得到什么:

                  C         D
A   B                        
bar one    0.400157  0.410599
    two   -0.977278  0.121675
    three  2.240893  1.454274
    total  1.663773  1.986547
foo one    2.714141  0.340644
    two    2.846296  0.905081
    three -0.151357  0.333674
    total  5.409080  1.579400

如何做到?

更新:我在> pandas groupby和组的总和中发现了类似的问题该问题还有2个答案.

UPDATE: I found a similar question at Pandas groupby and sum total of group It has 2 more answer for this question.

推荐答案

您可以巧妙地使用pd.Categoricalgroupby输出中为总计"创建占位符.这将使计算变得容易,并将总数分配回结果.

You can get clever with pd.Categorical to create a placeholder for "total" in the groupby output. This'll make it easy to compute and assign the total back to the result.

df.B = pd.Categorical(
         df.B, categories=np.append(df.B.unique(), 'total'))
v = df.groupby(by=['A', 'B']).sum()
v.loc(axis=0)[pd.IndexSlice[:,'total']] = v.groupby(level=0).sum().values

print(v)
                  C         D
A   B                        
bar one    0.400157  0.410599
    two   -0.977278  0.121675
    three  2.240893  1.454274
    total  1.663773  1.986547
foo one    2.714141  0.340644
    two    2.846296  0.905081
    three -0.151357  0.333674
    total  5.409080  1.579400


如果您需要汇总不同的指标:


If you need to aggregate on different metrics:

df.B = pd.Categorical(
         df.B, categories=np.append(df.B.unique(), 'total'))
idx = pd.MultiIndex.from_product([df.A.unique(), df.B.cat.categories]) 

v = df.groupby(by=['A', 'B']).agg(['sum', 'count']).reindex(idx)
v.loc(axis=0)[pd.IndexSlice[:,'total']] = v.groupby(level=0, sort=False).sum().values

print(v)
                  C               D      
                sum count       sum count
foo one    2.714141   2.0  0.340644   2.0
    two    2.846296   2.0  0.905081   2.0
    three -0.151357   1.0  0.333674   1.0
    total  5.409080   5.0  1.579400   5.0
bar one    0.400157   1.0  0.410599   1.0
    two   -0.977278   1.0  0.121675   1.0
    three  2.240893   1.0  1.454274   1.0
    total  1.663773   3.0  1.986547   3.0


另一种选择是pivot_table,它使边距的生成更加容易(尽管不提供次级边距):


Another alternative is pivot_table which makes margin generation easier (although does not provide sub-level margins):

df.pivot_table(index=['A', 'B'], 
               values=['C', 'D'], 
               aggfunc=['sum', 'count'], 
               margins=True)

                sum           count     
                  C         D     C    D
A   B                                   
bar one    0.400157  0.410599   1.0  1.0
    two   -0.977278  0.121675   1.0  1.0
    three  2.240893  1.454274   1.0  1.0
foo one    2.714141  0.340644   2.0  2.0
    two    2.846296  0.905081   2.0  2.0
    three -0.151357  0.333674   1.0  1.0
All        7.072852  3.565947   8.0  8.0

这篇关于按两列分组,并为第一级添加边距的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆