带有多级列的数据透视表 [英] Pivot table with multilevel columns

查看:373
本文介绍了带有多级列的数据透视表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给出以下代码.

import numpy as np
import pandas as pd

df = pd.DataFrame({
    'clients': pd.Series(['A', 'A', 'A', 'B', 'B']),
    'odd1': pd.Series([1, 1, 2, 1, 2]),
    'odd2': pd.Series([6, 7, 8, 9, 10])})

grpd = df.groupby(['clients', 'odd1']).agg({
    'odd2': [np.sum, np.average]
}).reset_index('clients').reset_index('odd1')

>> grpd
   odd1 clients  odd2         
                  sum  average
0     1       A    13      6.5
1     2       A     8      8.0
2     1       B     9      9.0
3     2       B    10     10.0

我想创建一个数据透视表,如下所示:

I would like to create a pivot table as below:

       | odd1    | odd1    | ...... | odd1    |
------------------------------------|---------|
clients| average | average | .....  | average |

所需的输出是:

clients |   1       2      
--------|------------------
A       |   6.5     8.0    
B       |   9.0     10.0

如果我们的列不是多层的,这将起作用:

This would work had we a column which is not multilevel:

grpd.pivot(index='clients', columns='odd1', values='odd2')

不确定我是否了解多层cols的工作原理.

Not sure I understand how multilevel cols work.

推荐答案

import numpy as np
import pandas as pd

df = pd.DataFrame({
    'clients': pd.Series(['A', 'A', 'A', 'B', 'B']),
    'odd1': pd.Series([1, 1, 2, 1, 2]),
    'odd2': pd.Series([6, 7, 8, 9, 10])})

aggd = df.groupby(['clients', 'odd1']).agg({
    'odd2': [np.sum, np.average]})

print(aggd.unstack(['odd1']).loc[:, ('odd2','average')])

收益

odd1       1   2
clients         
A        6.5   8
B        9.0  10


说明:grpd中的中间步骤之一是


Explanation: One of the intermediate steps in grpd is

aggd = df.groupby(['clients', 'odd1']).agg({
    'odd2': [np.sum, np.average]})

看起来像这样:

In [52]: aggd
Out[52]: 
             odd2        
              sum average
clients odd1             
A       1      13     6.5
        2       8     8.0
B       1       9     9.0
        2      10    10.0

aggd与所需结果之间的视觉比较

Visual comparison between aggd and the desired result

odd1       1   2
clients         
A        6.5   8
B        9.0  10

显示odd1索引需要成为列索引.该操作-将索引标签移动到列标签-是 unstack方法.因此很自然地将aggd堆叠:

shows that the odd1 index needs to become a column index. That operation -- the moving of index labels to column labels -- is the job done by the unstack method. So it is natural to unstack aggd:

In [53]: aggd.unstack(['odd1'])
Out[53]: 
        odd2                
         sum     average    
odd1       1   2       1   2
clients                     
A         13   8     6.5   8
B          9  10     9.0  10

现在可以很容易地看到我们只想选择平均列.可以使用loc:

Now it is easy to see we just want to select the average columns. That can be done with loc:

In [54]: aggd.unstack(['odd1']).loc[:, ('odd2','average')]
Out[54]: 
odd1       1   2
clients         
A        6.5   8
B        9.0  10

这篇关于带有多级列的数据透视表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆