pandas groupby和组的总和 [英] Pandas groupby and sum total of group

查看:89
本文介绍了 pandas groupby和组的总和的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个带有退款原因的Pandas DataFrame. 它包含以下示例数据行:

I have a Pandas DataFrame with customer refund reasons. It contains these example data rows:

    **case_type**       **claim_type**
1   service             service
2   service             service
3   chargeback          service
4   chargeback          local_charges
5   service             supplier_service
6   chargeback          service
7   chargeback          service
8   chargeback          service
9   chargeback          service
10  chargeback          service
11  service             service_not_used
12  service             service_not_used

我想将客户的原因与某种标记的原因进行比较.没问题,但是我也想查看特定组中的记录总数(客户原因).

I would like to compare the customer's reason with some sort of labeled reason. This is no problem, but I would also like to see the total number of records in a specific group (customer reason).

case_claim_type = df[["case_type", "claim_type"]]
case_claim_type.groupby(by=("case_type", "claim_type"))["case_type"].count()

哪个给我这个输出,例如:

Which gives me this output, for example:

**case_type**     **claim_type**                 
service           service                         2
                  supplier_service                1
                  service_not_used                2
chargeback        service                         6
                  local_charges                   1

我还希望每个case_type的输出总和.像这样:

I would also like to have have the sum of the output per case_type. Something like:

**case_type**     **claim_type**                 
service           service                         2
                  supplier_service                1
                  service_not_used                2
                  total:                          5
chargeback        service                         6
                  local_charges                   1
                  total:                          7

不一定必须采用最后一种输出格式,每case_type总计(汇总)的列也可以.

It doesn't necessarily has to be in this last output format, a column with the (aggregated) totals per case_type is also fine.

推荐答案

其中:

df = pd.DataFrame({'case_type':['Service']*20+['chargeback']*9,'claim_type':['service']*5+['local_charges']*5+['service_not_used']*5+['supplier_service']*5+['service']*8+['local_charges']})

df_out = df.groupby(by=("case_type", "claim_type"))["case_type"].count()

让我们使用pd.concat,带有级别参数的sumassign:

Let use pd.concat, sum with level parameter, and assign:

(pd.concat([df_out.to_frame(),
           df_out.sum(level=0).to_frame()
                 .assign(claim_type= "total")
                 .set_index('claim_type', append=True)])
  .sort_index())

输出:

                             case_type
case_type  claim_type                 
Service    local_charges             5
           service                   5
           service_not_used          5
           supplier_service          5
           total                    20
chargeback local_charges             1
           service                   8
           total                     9

这篇关于 pandas groupby和组的总和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆