对pandas中的嵌套groupby执行min()操作 [英] min() operation on nested groupby in pandas

查看:820
本文介绍了对pandas中的嵌套groupby执行min()操作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我只是开始了解熊猫,我无法克服一个概念问题。我的数据框如下:

  df = pd.DataFrame({'ANIMAL':[1,1,1,1, 1,2,2,2],
'AGE_D':[3,6,47,377,698,1,9,241],
'AGE_Y':[1,1,1,2,2,1, 1,1]})

我想在animal和age_y中做一个嵌套组,然后选择小组的最小值。
希望的输出是:

  ANIMAL AGE_Y AGE_D 
1 1 3
1 2 377
2 1 1

我可以在动物内不嵌套的情况下做到这一点,例如如果我的df2 = ANIMAL子集= 1
,那么

  df2.loc [df2.groupby('AGE_Y')) ['AGE_D'] .idxmin()] 

但是,这个小组没有成功。我在猜测我的操作顺序是错误的...
我应该如何处理这个问题?

解决方案

我认为你需要添加列到 groupby - group by列 ANIMAL AGE_Y

  df = df2.loc [df2.groupby(['ANIMAL','AGE_Y'])[ 'AGE_D'] .idxmin()] 
df = df [['ANIMAL','AGE_Y','AGE_D']]
print(df)
ANIMAL AGE_Y AGE_D
0 1 1 3
3 1 2 377
5 2 1 1


I am just getting to know pandas and I can't get over a conceptual problem. My dataframe is as follows:

df=pd.DataFrame({'ANIMAL':[1,1,1,1,1,2,2,2],
            'AGE_D' : [3,6,47,377,698,1,9,241],
            'AGE_Y' : [1,1,1,2,2,1,1,1]})

I would like to do a nested group within animal and age_y and then select the min on the subgroup. Desired output would be then:

ANIMAL  AGE_Y   AGE_D
1       1       3
1       2       377
2       1       1

I can do this without nesting within animal, e.g. if my df2 = subset for ANIMAL=1 then

df2.loc[df2.groupby('AGE_Y')['AGE_D'].idxmin()]

But all the things I tried with nesting the animal in the group by were unsuccesful. I am guessing that my order of the operations is wrong... How should I go about this?

解决方案

I think you need add columns to groupby - group by columns ANIMAL and AGE_Y:

df = df2.loc[df2.groupby(['ANIMAL','AGE_Y'])['AGE_D'].idxmin()]
df = df[['ANIMAL','AGE_Y','AGE_D']]
print (df)
   ANIMAL  AGE_Y  AGE_D
0       1      1      3
3       1      2    377
5       2      1      1

这篇关于对pandas中的嵌套groupby执行min()操作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆