为pandas数据透视表中的每个值列定义aggfunc [英] define aggfunc for each values column in pandas pivot table

查看:2156
本文介绍了为pandas数据透视表中的每个值列定义aggfunc的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

试图生成具有多个值"列的数据透视表.我知道我可以使用aggfunc以我想要的方式聚合值,但是如果我不想对两列进行求和或平均,但我想要对一列求和而对另一列求平均值,该怎么办.那么可以使用熊猫吗?

Was trying to generate a pivot table with multiple "values" columns. I know I can use aggfunc to aggregate values the way I want to, but what if I don't want to sum or avg both columns but instead I want sum of one column while mean of the other one. So is it possible to do so using pandas?

df = pd.DataFrame({
          'A' : ['one', 'one', 'two', 'three'] * 6,
          'B' : ['A', 'B', 'C'] * 8,
          'C' : ['foo', 'foo', 'foo', 'bar', 'bar', 'bar'] * 4,
          'D' : np.random.randn(24),
          'E' : np.random.randn(24)
})

现在,这将获得一个带有总和的数据透视表:

Now this will get a pivot table with sum:

pd.pivot_table(df, values=['D','E'], rows=['B'], aggfunc=np.sum)

这是故意的:

pd.pivot_table(df, values=['D','E'], rows=['B'], aggfunc=np.mean)

如何获取D的总和和E的均值?

How can I get sum for D and mean for E?

希望我的问题很清楚.

推荐答案

您可以合并两个DataFrames :

>>> df1 = pd.pivot_table(df, values=['D'], rows=['B'], aggfunc=np.sum)
>>> df2 = pd.pivot_table(df, values=['E'], rows=['B'], aggfunc=np.mean)
>>> pd.concat((df1, df2), axis=1)
          D         E
B                    
A  1.810847 -0.524178
B  2.762190 -0.443031
C  0.867519  0.078460

或者您可以传递功能列表作为aggfunc参数,然后重新编制索引:

or you can pass list of functions as aggfunc parameter and then reindex:

>>> df3 = pd.pivot_table(df, values=['D','E'], rows=['B'], aggfunc=[np.sum, np.mean])
>>> df3
        sum                mean          
          D         E         D         E
B                                        
A  1.810847 -4.193425  0.226356 -0.524178
B  2.762190 -3.544245  0.345274 -0.443031
C  0.867519  0.627677  0.108440  0.078460
>>> df3 = df3.ix[:, [('sum', 'D'), ('mean','E')]]
>>> df3.columns = ['D', 'E']
>>> df3
          D         E
B                    
A  1.810847 -0.524178
B  2.762190 -0.443031
C  0.867519  0.078460

Alghouth,最好有一个选项为每个列分别定义aggfunc.不知道如何实现,可能会传递给aggfunc像dict这样的参数,例如{'D':np.mean, 'E':np.sum}.

Alghouth, it would be nice to have an option to defin aggfunc for each column individually. Don't know how it could be done, may be pass into aggfunc dict-like parameter, like {'D':np.mean, 'E':np.sum}.

更新实际上,在您的情况下,您可以手动旋转:

update Actually, in your case you can pivot by hand:

>>> df.groupby('B').aggregate({'D':np.sum, 'E':np.mean})
          E         D
B                    
A -0.524178  1.810847
B -0.443031  2.762190
C  0.078460  0.867519

这篇关于为pandas数据透视表中的每个值列定义aggfunc的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆