pandas 按组汇总排序 [英] Pandas sorting by group aggregate

查看:103
本文介绍了 pandas 按组汇总排序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经看过这个问题,但是需要结果与我的略有不同.

I've already seen this question, but the desired outcome there is slightly different from mine.

想象一下这样分组的数据框:

Imagine a dataframe grouped thusly:

df.groupby(['product_name', 'usage_type']).total_cost.sum()

product_name   usage_type
Lorem          A               30.694665
               B                0.000634
               C                1.659360
               D                0.000031
               E             3339.140042
               F                0.074340
Ipsum          G                9.627360
               A               19.053377
               D               14.492155
Dolor          B                9.698245
               H             6993.792163
               C            31947.955679
               D             2150.400001
               E               26.337789
Name: total_cost, dtype: float6

我想要的输出是相同的结构,但是具有两个属性:

The output I want is the same structure, but with two properties:

  1. 按费用总和订购产品名称
  2. 按字典顺序对使用类型进行排序(另一种可行的选择:按降序对这些使用类型进行排序)

首先出现价格最高的产品,但仍保留故障.

Such that the highest-cost products show up first, but still preserving the breakdown.

如果要简单得多,我可以按使用类型删除次级排序.

If it is significantly simpler, I'm okay with dropping the secondary sorting by usage type.

推荐答案

从分组的DataFrame开始:

Starting with your grouped DataFrame:

import pandas as pd
df2 = pd.read_table('data', sep='\s+').set_index(['product_name', 'usage_type'])
#                                   val
# product_name usage_type              
# Lorem        A              30.694665
#              B               0.000634
#              C               1.659360
#              D               0.000031
#              E            3339.140042
#              F               0.074340
# Ipsum        G               9.627360
#              A              19.053377
#              D              14.492155
# Dolor        B               9.698245
#              H            6993.792163
#              C           31947.955679
#              D            2150.400001
#              E              26.337789

您可以将键值存储在新列中:

You could store the key values in new columns:

df2['key1'] = df2.groupby(level='product_name')['val'].transform('sum')
df2['key2'] = df2.index.get_level_values('usage_type')

,然后按这些关键列进行排序:

and then sort by those key columns:

# >>> df2.sort(['key1', 'key2'], ascending=[False,True])
#                                   val          key1 key2
# product_name usage_type                                 
# Dolor        B               9.698245  41128.183877    B
#              C           31947.955679  41128.183877    C
#              D            2150.400001  41128.183877    D
#              E              26.337789  41128.183877    E
#              H            6993.792163  41128.183877    H
# Lorem        A              30.694665   3371.569072    A
#              B               0.000634   3371.569072    B
#              C               1.659360   3371.569072    C
#              D               0.000031   3371.569072    D
#              E            3339.140042   3371.569072    E
#              F               0.074340   3371.569072    F
# Ipsum        A              19.053377     43.172892    A
#              D              14.492155     43.172892    D
#              G               9.627360     43.172892    G

result = df2.sort(['key1', 'key2'], ascending=[False,True])['val']
print(result)

收益

product_name  usage_type
Dolor         B                 9.698245
              C             31947.955679
              D              2150.400001
              E                26.337789
              H              6993.792163
Lorem         A                30.694665
              B                 0.000634
              C                 1.659360
              D                 0.000031
              E              3339.140042
              F                 0.074340
Ipsum         A                19.053377
              D                14.492155
              G                 9.627360

这篇关于 pandas 按组汇总排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆