pandas 按组汇总排序 [英] Pandas sorting by group aggregate
本文介绍了 pandas 按组汇总排序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我已经看过这个问题,但是需要结果与我的略有不同.
I've already seen this question, but the desired outcome there is slightly different from mine.
想象一下这样分组的数据框:
Imagine a dataframe grouped thusly:
df.groupby(['product_name', 'usage_type']).total_cost.sum()
product_name usage_type
Lorem A 30.694665
B 0.000634
C 1.659360
D 0.000031
E 3339.140042
F 0.074340
Ipsum G 9.627360
A 19.053377
D 14.492155
Dolor B 9.698245
H 6993.792163
C 31947.955679
D 2150.400001
E 26.337789
Name: total_cost, dtype: float6
我想要的输出是相同的结构,但是具有两个属性:
The output I want is the same structure, but with two properties:
- 按费用总和订购产品名称
- 按字典顺序对使用类型进行排序(另一种可行的选择:按降序对这些使用类型进行排序)
首先出现价格最高的产品,但仍保留故障.
Such that the highest-cost products show up first, but still preserving the breakdown.
如果要简单得多,我可以按使用类型删除次级排序.
If it is significantly simpler, I'm okay with dropping the secondary sorting by usage type.
推荐答案
从分组的DataFrame开始:
Starting with your grouped DataFrame:
import pandas as pd
df2 = pd.read_table('data', sep='\s+').set_index(['product_name', 'usage_type'])
# val
# product_name usage_type
# Lorem A 30.694665
# B 0.000634
# C 1.659360
# D 0.000031
# E 3339.140042
# F 0.074340
# Ipsum G 9.627360
# A 19.053377
# D 14.492155
# Dolor B 9.698245
# H 6993.792163
# C 31947.955679
# D 2150.400001
# E 26.337789
您可以将键值存储在新列中:
You could store the key values in new columns:
df2['key1'] = df2.groupby(level='product_name')['val'].transform('sum')
df2['key2'] = df2.index.get_level_values('usage_type')
,然后按这些关键列进行排序:
and then sort by those key columns:
# >>> df2.sort(['key1', 'key2'], ascending=[False,True])
# val key1 key2
# product_name usage_type
# Dolor B 9.698245 41128.183877 B
# C 31947.955679 41128.183877 C
# D 2150.400001 41128.183877 D
# E 26.337789 41128.183877 E
# H 6993.792163 41128.183877 H
# Lorem A 30.694665 3371.569072 A
# B 0.000634 3371.569072 B
# C 1.659360 3371.569072 C
# D 0.000031 3371.569072 D
# E 3339.140042 3371.569072 E
# F 0.074340 3371.569072 F
# Ipsum A 19.053377 43.172892 A
# D 14.492155 43.172892 D
# G 9.627360 43.172892 G
result = df2.sort(['key1', 'key2'], ascending=[False,True])['val']
print(result)
收益
product_name usage_type
Dolor B 9.698245
C 31947.955679
D 2150.400001
E 26.337789
H 6993.792163
Lorem A 30.694665
B 0.000634
C 1.659360
D 0.000031
E 3339.140042
F 0.074340
Ipsum A 19.053377
D 14.492155
G 9.627360
这篇关于 pandas 按组汇总排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文