如何将多个功能应用于groupby对象 [英] How to apply multiple functions to a groupby object
问题描述
例如,我有两个lambda函数可应用于分组的数据帧:
For example, I have two lambda functions to apply to a grouped data frame:
df.groupby(['A', 'B']).apply(lambda g: ...)
df.groupby(['A', 'B']).apply(lambda g: ...)
这两种方法都可以,但结合使用则不能:
Both would work, but not when combined:
df.groupby(['A', 'B']).apply([lambda g: ..., lambda g: ...])
那是为什么?如何将不同的功能应用于已分组的对象,并将每个结果按列连接在一起?
Why is that? How can I apply different functions to a grouped object and get each result concatenated column wise together?
有没有一种方法可以不为函数指定某些列?您建议的所有内容似乎仅适用于某些列.
Is there a way not to specify some column to a function? All you have suggested seemed to only work with certain columns.
推荐答案
这是一个很好的机会来突出显示熊猫0.20的变化之一
This is a good opportunity to highlight one of the changes in pandas 0.20
What does this mean?
Consider the dataframedf
df = pd.DataFrame(dict( A=np.tile([1, 2], 2).repeat(2), B=np.repeat([1, 2], 2).repeat(2), C=np.arange(8) )) df A B C 0 1 1 0 1 1 1 1 2 2 1 2 3 2 1 3 4 1 2 4 5 1 2 5 6 2 2 6 7 2 2 7
我们以前可以做
df.groupby(['A', 'B']).C.agg(dict(f1=lambda x: x.size, f2=lambda x: x.max())) f1 f2 A B 1 1 2 1 2 2 5 2 1 2 3 2 2 7
我们的名字
'f1'
和'f2'
被放置为列标题.但是,使用熊猫0.20可以得到这个And our names
'f1'
and'f2'
were placed as column headers. However, with pandas 0.20 I get this
//anaconda/envs/3.6/lib/python3.6/site-packages/ipykernel/__main__.py:1: FutureWarning: using a dict on a Series for aggregation is deprecated and will be removed in a future version if __name__ == '__main__':
那是什么意思?如果我做两个
lambdas
没有命名字典怎么办?So what does that mean? What if I do two
lambdas
without the naming dictionary?
df.groupby(['A', 'B']).C.agg([lambda x: x.size, lambda x: x.max()]) --------------------------------------------------------------------------- SpecificationError Traceback (most recent call last) <ipython-input-398-fc26cf466812> in <module>() ----> 1 print(df.groupby(['A', 'B']).C.agg([lambda x: x.size, lambda x: x.max()])) //anaconda/envs/3.6/lib/python3.6/site-packages/pandas/core/groupby.py in aggregate(self, func_or_funcs, *args, **kwargs) 2798 if hasattr(func_or_funcs, '__iter__'): 2799 ret = self._aggregate_multiple_funcs(func_or_funcs, -> 2800 (_level or 0) + 1) 2801 else: 2802 cyfunc = self._is_cython_func(func_or_funcs) //anaconda/envs/3.6/lib/python3.6/site-packages/pandas/core/groupby.py in _aggregate_multiple_funcs(self, arg, _level) 2863 if name in results: 2864 raise SpecificationError('Function names must be unique, ' -> 2865 'found multiple named %s' % name) 2866 2867 # reset the cache so that we SpecificationError: Function names must be unique, found multiple named <lambda>
在名为
'<lambda>'
pandas错误
解决方案:为您的函数命名
Solution: Name your functions
def f1(x): return x.size def f2(x): return x.max() df.groupby(['A', 'B']).C.agg([f1, f2]) f1 f2 A B 1 1 2 1 2 2 5 2 1 2 3 2 2 7
这篇关于如何将多个功能应用于groupby对象的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!