如何使用groupby将多个功能应用于 pandas 的多个列? [英] How to use groupby to apply multiple functions to multiple columns in Pandas?

查看:103
本文介绍了如何使用groupby将多个功能应用于 pandas 的多个列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个正常的df

A = pd.DataFrame([[1, 5, 2], [2, 4, 4], [3, 3, 1], [4, 2, 2], [5, 1, 4]],
                 columns=['A', 'B', 'C'], index=[1, 2, 3, 4, 5])

以下这个食谱,我得到了我想要的结果。

Following this recipe, I got the the results I wanted.

In [62]: A.groupby((A['A'] > 2)).apply(lambda x: pd.Series(dict(
                   up_B=(x.B >= 0).sum(), down_B=(x.B < 0).sum(), mean_B=(x.B).mean(), std_B=(x.B).std(),
                   up_C=(x.C >= 0).sum(), down_C=(x.C < 0).sum(), mean_C=(x.C).mean(), std_C=(x.C).std())))

Out[62]:
       down_B  down_C  mean_B    mean_C     std_B     std_C  up_B  up_C
A                                                                      
False       0       0     4.5  3.000000  0.707107  1.414214     2     2
True        0       0     2.0  2.333333  1.000000  1.527525     3     3

这种方法很好,但是想象一下,你必须这样做大量的列(15-100),那么你必须在公式中输入所有的东西,这可能很麻烦。

This approach is fine, but imagine you had to do this for a large number of columns (15-100), then you have to type all that stuff in the formula, which can be cumbersome.

给定相同的公式应用于所有列。有没有一个有效的方式来做这个大量的列?

Given that the same formulas are applied to ALL columns. Is there an efficient way to do this for a large number of columns?.

谢谢

推荐答案

由于您将每个分组列汇总成一个值可以使用 agg 而不是 apply agg 方法可以功能列表作为输入。这些功能将被应用到每一列:

Since you are aggregating each grouped column into one value, you can use agg instead of apply. The agg method can take a list of functions as input. The functions will be applied to each column:

def up(x):
    return (x >= 0).sum()
def down(x):
    return (x < 0).sum()

result = A.loc[:, 'B':'C'].groupby((A['A'] > 2)).agg(
             [up, down, 'mean', 'std'])
print(result)

收到

       B                      C                         
      up down mean       std up down      mean       std
A                                                       
False  2    0  4.5  0.707107  2    0  3.000000  1.414214
True   3    0  2.0  1.000000  3    0  2.333333  1.527525

result has hierarchical( MultiIndexed)列。要选择某列(或列),您可以使用:

result has hierarchical ("MultiIndexed") columns. To select a certain column (or columns), you could use:

In [39]: result['B','mean']
Out[39]: 
A
False    4.5
True     2.0
Name: (B, mean), dtype: float64

In [46]: result[[('B', 'mean'), ('C', 'mean')]]
Out[46]: 
         B         C
      mean      mean
A                   
False  4.5  3.000000
True   2.0  2.333333

或您可以将MultiIndex的一个级别移动到索引:

or you could move one level of the MultiIndex to the index:

In [40]: result.stack()
Out[40]: 
                   B         C
A                             
False up    2.000000  2.000000
      down  0.000000  0.000000
      mean  4.500000  3.000000
      std   0.707107  1.414214
True  up    3.000000  3.000000
      down  0.000000  0.000000
      mean  2.000000  2.333333
      std   1.000000  1.527525

这篇关于如何使用groupby将多个功能应用于 pandas 的多个列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆