如何将不同的函数应用于groupby对象? [英] How to apply different functions to a groupby object?

查看:136
本文介绍了如何将不同的函数应用于groupby对象?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有这样的数据框:

 将pandas导入为pd 

df = pd。 DataFrame({'id':[1,2,1,2,1,2,2],
'min_max':['max_val','max_val','min_val','min_val', 'max_val','max_val','min_val','min_val'],
'value':[1,20,20,10,12,3,10,-5]})

id min_max值
0 1 max_val 1
1 2 max_val 20
2 1 min_val 20
3 1 min_val 10
4 2 max_val 12
5 1 max_val 3
6 2 min_val -10
7 2 min_val -5

每个 id 都有几个与它关联的最大值和最小值。我的期望输出看起来像这样:

  max min 
id
1 3 10
2 20 -10

它包含最大值 max_val 和每个 id 的最小 min_val



目前我执行如下:

  gdf ​​= df.groupby(by = ['id', 'min_max'])['value'] 

max_max = gdf.max()。loc [:,'max_val']
min_min = gdf.min()。loc [:, 'min_val']

final_df = pd.concat([max_max,min_min],axis = 1)
final_df.columns = ['max','min']

我不喜欢的是我必须调用 .max() .min()分别放在分组的数据框 gdf ​​中,我丢弃50%的信息(因为我对最大的 min_val 和最小的 min_val )不感兴趣。



有没有一种方法可以通过比较直接的方式来实现这一点?将应用于组的函数直接传递给 groupby 调用?



编辑:

  df.groupby('id')['value']。agg(['max','min'])

是不够的,因为可能会出现一个组 min_val 高于该组的所有 max_val 或低于所有<$ c $的 max_val C> MIN_VAL 。因此,还必须根据 min_max 列进行分组。

结果对于

  df.groupby('id')['value']。agg(['max','min'])

max min
id
1 20 1
2 20 -10

上面代码的结果:

  max min 
id
1 3 10
2 20 -10


解决方案

-cheek解决方案:

 >>> df.groupby(['id','min_max'])['value']。apply(lambda g:getattr(g,g.name [1] [:3])())。unstack()
min_max max_val min_val
id
1 3 10
2 20 -10





显然这不会如此简单,如果没有的话字符串max_val和函数名称max之间的这种简单关系。它可以通过一个dict映射列值来应用函数来推广,如下所示:

  func_map = {'min_val' :min,'max_val':max} 
df.groupby(['id','min_max'])['value']。apply(lambda g:func_map [g.name [1]](g) ).unstack()

请注意,这比上面的版本效率略低,因为它会调用Python的最大/最小值,而不是优化的熊猫版本。但是如果你想要一个更普遍的解决方案,那就是你必须做的事情,因为没有优化任何东西的熊猫版本。 (这也或多或少是为什么没有内建的方法来做到这一点:对于大多数数据,您不能先假定您的值可以映射到有意义的函数,所以尝试确定它是没有意义的基于值本身的应用函数。)


I have a dataframe like this:

import pandas as pd

df = pd.DataFrame({'id': [1, 2, 1, 1, 2, 1, 2, 2],
               'min_max': ['max_val', 'max_val', 'min_val', 'min_val', 'max_val', 'max_val', 'min_val', 'min_val'],
               'value': [1, 20, 20, 10, 12, 3, -10, -5 ]})

   id  min_max  value
0   1  max_val      1
1   2  max_val     20
2   1  min_val     20
3   1  min_val     10
4   2  max_val     12
5   1  max_val      3
6   2  min_val    -10
7   2  min_val     -5

Each id has several maximal and minimal values associated with it. My desired output looks like this:

    max  min
id          
1     3   10
2    20  -10

It contains the maximal max_val and the minimal min_val for each id.

Currently I implement that as follows:

gdf = df.groupby(by=['id', 'min_max'])['value']

max_max = gdf.max().loc[:, 'max_val']
min_min = gdf.min().loc[:, 'min_val']

final_df = pd.concat([max_max, min_min], axis=1)
final_df.columns = ['max', 'min']

What I don't like is that I have to call .max() and .min() on the grouped dataframe gdf, separately where I throw away 50% of the information (since I am not interested in the maximal min_val and the minimal min_val).

Is there a way to do this in a more straightforward manner by e.g. passing the function that should be applied to a group directly to the groupby call?

EDIT:

df.groupby('id')['value'].agg(['max','min'])

is not sufficient as there can be the case that a group has a min_val that is higher than all max_val for that group or a max_val that is lower than all min_val. Thus, one also has to group based on the column min_max.

Result for

df.groupby('id')['value'].agg(['max','min'])

    max  min
id          
1    20    1
2    20  -10

Result for the code from above:

    max  min
id          
1     3   10
2    20  -10

解决方案

Here's a slightly tongue-in-cheek solution:

>>> df.groupby(['id', 'min_max'])['value'].apply(lambda g: getattr(g, g.name[1][:3])()).unstack()
min_max  max_val  min_val
id                       
1              3       10
2             20      -10

This applies a function that grabs the name of the real function to apply from the group key.

Obviously this wouldn't work so simply if there weren't such a simple relationship between the string "max_val" and the function name "max". It could be generalized by having a dict mapping column values to functions to apply, something like this:

func_map = {'min_val': min, 'max_val': max}
df.groupby(['id', 'min_max'])['value'].apply(lambda g: func_map[g.name[1]](g)).unstack()

Note that this is slightly less efficient than the version above, since it calls the plain Python max/min rather than the optimized pandas versions. But if you want a more generalizable solution, that's what you have to do, because there aren't optimized pandas versions of everything. (This is also more or less why there's no built-in way to do this: for most data, you can't assume a priori that your values can be mapped to meaningful functions, so it doesn't make sense to try to determine the function to apply based on the values themselves.)

这篇关于如何将不同的函数应用于groupby对象?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆