python pandas:将不同的聚合函数应用于不同的列 [英] python pandas: applying different aggregate functions to different columns

查看:178
本文介绍了python pandas:将不同的聚合函数应用于不同的列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图了解这个简单的SQL语句的等效内容:

I am trying to understand what the equivalent of this simple SQL statement would be:

select mykey, sum(Field1) as sum_of_field1, avg(Field1) as avg_field1, min(field2) as min_field2
from df
group by mykey

我知道我可以将字典传递给agg()函数:

I understand I can passa a dictionary to the agg() function:

  f = {'Field1':'sum',
         'Field2':['max','mean'],
         'Field3':['min','mean','count'],
         'Field4':'count'
         }

    grouped = df.groupby('mykey').agg(f)

但是,结果列名称似乎是由熊猫自动选择的:('Field1','sum')等.

However, the resulting column names seem to be chosen by pandas automatically: ('Field1','sum') etc.

有没有一种方法可以为列名传递字符串,以使该字段不是('Field1','sum')而是我可以选择的东西,例如sum_of_field1?

Is there a way to pass strings for column names, so that the field is not ('Field1','sum') but something I can choose, like sum_of_field1 ?

谢谢.我在这里查看了以下文档: http://pandas.pydata.org/pandas- docs/stable/groupby.html 但找不到答案.

Thanks. I looked at the docs here: http://pandas.pydata.org/pandas-docs/stable/groupby.html but couldn't quite find an answer.

推荐答案

从熊猫0.25开始,使用.

As of pandas 0.25, this is possible with a "Named aggregation".

In [79]: animals = pd.DataFrame({'kind': ['cat', 'dog', 'cat', 'dog'],
   ....:                         'height': [9.1, 6.0, 9.5, 34.0],
   ....:                         'weight': [7.9, 7.5, 9.9, 198.0]})
   ....: 

In [80]: animals
Out[80]: 
  kind  height  weight
0  cat     9.1     7.9
1  dog     6.0     7.5
2  cat     9.5     9.9
3  dog    34.0   198.0

In [82]: animals.groupby("kind").agg(
   ....:     min_height=('height', 'min'),
   ....:     max_height=('height', 'max'),
   ....:     average_weight=('weight', np.mean),
   ....: )
   ....: 
Out[82]: 
      min_height  max_height  average_weight
kind                                        
cat          9.1         9.5            8.90
dog          6.0        34.0          102.75

先前不推荐使用的版本如下:

The previously deprecated version follows:

例如,您可以将字典词典传递给.agg映射{column: {name: aggfunc}}

You can pass a dictionary of dictionaries to .agg mapping {column: {name: aggfunc}}, for example

In [46]: df.head()
Out[46]:
   Year  qtr  realgdp  realcons  realinvs  realgovt  realdpi  cpi_u      M1  \
0  1950    1   1610.5    1058.9     198.1     361.0   1186.1   70.6  110.20
1  1950    2   1658.8    1075.9     220.4     366.4   1178.1   71.4  111.75
2  1950    3   1723.0    1131.0     239.7     359.6   1196.5   73.2  112.95
3  1950    4   1753.9    1097.6     271.8     382.5   1210.0   74.9  113.93
4  1951    1   1773.5    1122.8     242.9     421.9   1207.9   77.3  115.08

   tbilrate  unemp      pop     infl  realint
0      1.12    6.4  149.461   0.0000   0.0000
1      1.17    5.6  150.260   4.5071  -3.3404
2      1.23    4.6  151.064   9.9590  -8.7290
3      1.35    4.2  151.871   9.1834  -7.8301
4      1.40    3.5  152.393  12.6160 -11.2160

In [47]: df.groupby('qtr').agg({"realgdp": {"mean_gdp": "mean", "std_gdp": "std"},
                                "unemp": {"mean_unemp": "mean"}})
Out[47]:
         realgdp                   unemp
        mean_gdp      std_gdp mean_unemp
qtr
1    4506.439216  2104.195963   5.694118
2    4546.043137  2121.824090   5.686275
3    4580.507843  2132.897955   5.662745
4    4617.592157  2158.132698   5.654902

结果在列中有一个MultiIndex.如果您不希望使用外部级别,则可以使用.columns.droplevel(0).

The result has a MultiIndex in the columns. If you don't want that outer level, you can use .columns.droplevel(0).

这篇关于python pandas:将不同的聚合函数应用于不同的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆