如何将不同的函数应用于groupby对象？ [英] How to apply different functions to a groupby object?

查看：136 发布时间：2018/5/30 14:10:45 python pandas dataframe group-by

本文介绍了如何将不同的函数应用于groupby对象？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有这样的数据框：

 将pandas导入为pd 
 
 df = pd。 DataFrame（{'id'：[1，2，1，2，1，2，2]，
'min_max'：['max_val'，'max_val'，'min_val'，'min_val'， 'max_val'，'max_val'，'min_val'，'min_val']，
'value'：[1,20,20,10,12,3,10，-5]}）
 
 id min_max值
 0 1 max_val 1 
 1 2 max_val 20 
 2 1 min_val 20 
 3 1 min_val 10 
 4 2 max_val 12 
 5 1 max_val 3 
 6 2 min_val -10 
 7 2 min_val -5

每个 id 都有几个与它关联的最大值和最小值。我的期望输出看起来像这样：

  max min 
 id 
 1 3 10 
 2 20 -10

它包含最大值 max_val 和每个 id 的最小 min_val 。

目前我执行如下：

gdf = df.groupby（by = ['id'， 'min_max']）['value'] max_max = gdf.max（）。loc [:,'max_val'] min_min = gdf.min（）。loc [:, 'min_val'] final_df = pd.concat（[max_max，min_min]，axis = 1） final_df.columns = ['max'，'min']
我不喜欢的是我必须调用 .max（）和 .min（）分别放在分组的数据框 gdf 中，我丢弃50％的信息（因为我对最大的 min_val 和最小的 min_val ）不感兴趣。

有没有一种方法可以通过比较直接的方式来实现这一点？将应用于组的函数直接传递给 groupby 调用？

编辑：
df.groupby（'id'）['value']。agg（['max'，'min']）
是不够的，因为可能会出现一个组 min_val 高于该组的所有 max_val 或低于所有<$ c $的 max_val C> MIN_VAL 。因此，还必须根据 min_max 列进行分组。

结果对于

df.groupby（'id'）['value']。agg（['max'，'min']） max min id 1 20 1 2 20 -10
上面代码的结果：

max min id 1 3 10 2 20 -10

解决方案
-cheek解决方案：
>>> df.groupby（['id'，'min_max']）['value']。apply（lambda g：getattr（g，g.name [1] [：3]）（））。unstack（） min_max max_val min_val id 1 3 10 2 20 -10

显然这不会如此简单，如果没有的话字符串max_val和函数名称max之间的这种简单关系。它可以通过一个dict映射列值来应用函数来推广，如下所示：

func_map = {'min_val' ：min，'max_val'：max} df.groupby（['id'，'min_max']）['value']。apply（lambda g：func_map [g.name [1]]（g））.unstack（）
请注意，这比上面的版本效率略低，因为它会调用Python的最大/最小值，而不是优化的熊猫版本。但是如果你想要一个更普遍的解决方案，那就是你必须做的事情，因为没有优化任何东西的熊猫版本。（这也或多或少是为什么没有内建的方法来做到这一点：对于大多数数据，您不能先假定您的值可以映射到有意义的函数，所以尝试确定它是没有意义的基于值本身的应用函数。）

I have a dataframe like this:
import pandas as pd df = pd.DataFrame({'id': [1, 2, 1, 1, 2, 1, 2, 2], 'min_max': ['max_val', 'max_val', 'min_val', 'min_val', 'max_val', 'max_val', 'min_val', 'min_val'], 'value': [1, 20, 20, 10, 12, 3, -10, -5 ]}) id min_max value 0 1 max_val 1 1 2 max_val 20 2 1 min_val 20 3 1 min_val 10 4 2 max_val 12 5 1 max_val 3 6 2 min_val -10 7 2 min_val -5
Each id has several maximal and minimal values associated with it. My desired output looks like this:
max min id 1 3 10 2 20 -10
It contains the maximal max_val and the minimal min_val for each id.

Currently I implement that as follows:
gdf = df.groupby(by=['id', 'min_max'])['value'] max_max = gdf.max().loc[:, 'max_val'] min_min = gdf.min().loc[:, 'min_val'] final_df = pd.concat([max_max, min_min], axis=1) final_df.columns = ['max', 'min']
What I don't like is that I have to call .max() and .min() on the grouped dataframe gdf, separately where I throw away 50% of the information (since I am not interested in the maximal min_val and the minimal min_val).

Is there a way to do this in a more straightforward manner by e.g. passing the function that should be applied to a group directly to the groupby call?

EDIT:
df.groupby('id')['value'].agg(['max','min'])
is not sufficient as there can be the case that a group has a min_val that is higher than all max_val for that group or a max_val that is lower than all min_val. Thus, one also has to group based on the column min_max.

Result for
df.groupby('id')['value'].agg(['max','min']) max min id 1 20 1 2 20 -10
Result for the code from above:
max min id 1 3 10 2 20 -10

解决方案
Here's a slightly tongue-in-cheek solution:
>>> df.groupby(['id', 'min_max'])['value'].apply(lambda g: getattr(g, g.name[1][:3])()).unstack() min_max max_val min_val id 1 3 10 2 20 -10
This applies a function that grabs the name of the real function to apply from the group key.

Obviously this wouldn't work so simply if there weren't such a simple relationship between the string "max_val" and the function name "max". It could be generalized by having a dict mapping column values to functions to apply, something like this:
func_map = {'min_val': min, 'max_val': max} df.groupby(['id', 'min_max'])['value'].apply(lambda g: func_map[g.name[1]](g)).unstack()
Note that this is slightly less efficient than the version above, since it calls the plain Python max/min rather than the optimized pandas versions. But if you want a more generalizable solution, that's what you have to do, because there aren't optimized pandas versions of everything. (This is also more or less why there's no built-in way to do this: for most data, you can't assume a priori that your values can be mapped to meaningful functions, so it doesn't make sense to try to determine the function to apply based on the values themselves.)

这篇关于如何将不同的函数应用于groupby对象？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何将不同的函数应用于groupby对象？ [英] How to apply different functions to a groupby object?

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何将不同的函数应用于groupby对象？ [英] How to apply different functions to a groupby object?

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭