使用Python中的Pandas,为每个组选择最高价值的行 [英] With Pandas in Python, select the highest value row for each group
本文介绍了使用Python中的Pandas,为每个组选择最高价值的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
对于Pandas,用于以下数据集
With Pandas, for the following data set
author1,category1,10.00
author1,category2,15.00
author1,category3,12.00
author2,category1,5.00
author2,category2,6.00
author2,category3,4.00
author2,category4,9.00
author3,category1,7.00
author3,category2,4.00
author3,category3,7.00
我想为每个作者获得最高价值
I would like to get the highest value for each author
author1,category2,15.00
author2,category4,9.00
author3,category1,7.00
author3,category3,7.00
(抱歉,我是熊猫小白菜.)
(Apologies, I'm a pandas noob.)
推荐答案
由于您也想检索category
列,因此列val
上的标准.agg
不会提供您想要的内容. (此外,由于author3中有两个值7,@ Padraic Cunningham使用.max()
的方法只会返回一个实例,而不是两个实例).您可以定义一个自定义的apply
函数来完成任务.
Since you want to retrieve category
column as well, a standard .agg
on column val
won't give you what you want. (also, since there are two values in author3 are 7, the approach by @Padraic Cunningham using.max()
will only return one instance instead of both) You can define a customized apply
function to accomplish your task.
import pandas as pd
# your data, assume columns names are: author, cat, val
# ===============================
print(df)
author cat val
0 author1 category1 10
1 author1 category2 15
2 author1 category3 12
3 author2 category1 5
4 author2 category2 6
5 author2 category3 4
6 author2 category4 9
7 author3 category1 7
8 author3 category2 4
9 author3 category3 7
# processing
# ====================================
def func(group):
return group.loc[group['val'] == group['val'].max()]
df.groupby('author', as_index=False).apply(func).reset_index(drop=True)
author cat val
0 author1 category2 15
1 author2 category4 9
2 author3 category1 7
3 author3 category3 7
这篇关于使用Python中的Pandas,为每个组选择最高价值的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文