在pandas python中按两列和第三个最大值分组 [英] Group by two columns and max value of third in pandas python
本文介绍了在pandas python中按两列和第三个最大值分组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个带有 PERIOD_START_TIME、ID、更多列和列 VALUE 的数据框.我需要的是按 PERIOD_START_TIME 和 ID 分组(因为按时间和 ID 存在重复行)并取列 VALUE 的最大值.df:
I have a dataframe with PERIOD_START_TIME, ID, a few more columns and column VALUE. What I need is group by PERIOD_START_TIME and ID(cause there are duplicate rows by time and ID) and take max value of column VALUE. df:
PERIOD_START_TIME ID VALUE
06.01.2017 02:00:00 55 ... 35
06.01.2017 02:00:00 55 ... 22
06.01.2017 03:00:00 55 ... 63
06.01.2017 03:00:00 55 ... 33
06.01.2017 04:00:00 55 ... 63
06.01.2017 04:00:00 55 ... 45
06.01.2017 02:00:00 65 ... 10
06.01.2017 02:00:00 65 ... 5
06.01.2017 03:00:00 65 ... 22
06.01.2017 03:00:00 65 ... 5
06.01.2017 04:00:00 65 ... 12
06.01.2017 04:00:00 65 ... 15
所需的输出:
PERIOD_START_TIME ID ... VALUE
06.01.2017 02:00:00 55 ... 35
06.01.2017 03:00:00 55 ... 63
06.01.2017 04:00:00 55 ... 63
06.01.2017 02:00:00 65 ... 10
06.01.2017 03:00:00 65 ... 22
06.01.2017 04:00:00 65 ... 15
推荐答案
print (df)
PERIOD_START_TIME ID A VALUE
0 06.01.2017 02:00:00 55 8 35
1 06.01.2017 02:00:00 55 8 22
2 06.01.2017 03:00:00 55 8 63
3 06.01.2017 03:00:00 55 8 33
4 06.01.2017 04:00:00 55 8 63
5 06.01.2017 04:00:00 55 8 45
6 06.01.2017 02:00:00 65 8 10
7 06.01.2017 02:00:00 65 8 5
8 06.01.2017 03:00:00 65 8 22
9 06.01.2017 03:00:00 65 8 5
10 06.01.2017 04:00:00 65 8 12
11 06.01.2017 04:00:00 65 8 15
df = df.groupby(['PERIOD_START_TIME','ID'], as_index=False)['VALUE'].max()
或者:
df = df.groupby(['PERIOD_START_TIME','ID'])['VALUE'].max().reset_index()
<小时>
print (df)
PERIOD_START_TIME ID VALUE
0 06.01.2017 02:00:00 55 35
1 06.01.2017 02:00:00 65 10
2 06.01.2017 03:00:00 55 63
3 06.01.2017 03:00:00 65 22
4 06.01.2017 04:00:00 55 63
5 06.01.2017 04:00:00 65 15
For more columns need idxmax
and select by loc
:
df = df.loc[df.groupby(['PERIOD_START_TIME','ID'])['VALUE'].idxmax()]
print (df)
PERIOD_START_TIME ID A VALUE
0 06.01.2017 02:00:00 55 8 35
6 06.01.2017 02:00:00 65 8 10
2 06.01.2017 03:00:00 55 8 63
8 06.01.2017 03:00:00 65 8 22
4 06.01.2017 04:00:00 55 8 63
11 06.01.2017 04:00:00 65 8 15
替代方案:
cols = ['PERIOD_START_TIME','ID']
df = df.sort_values(cols).groupby(cols, as_index=False).first()
print (df)
PERIOD_START_TIME ID A VALUE
0 06.01.2017 02:00:00 55 8 35
1 06.01.2017 02:00:00 65 8 10
2 06.01.2017 03:00:00 55 8 63
3 06.01.2017 03:00:00 65 8 22
4 06.01.2017 04:00:00 55 8 63
5 06.01.2017 04:00:00 65 8 12
这篇关于在pandas python中按两列和第三个最大值分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文