带有多级列的数据透视表 [英] Pivot table with multilevel columns
问题描述
给出以下代码.
import numpy as np
import pandas as pd
df = pd.DataFrame({
'clients': pd.Series(['A', 'A', 'A', 'B', 'B']),
'odd1': pd.Series([1, 1, 2, 1, 2]),
'odd2': pd.Series([6, 7, 8, 9, 10])})
grpd = df.groupby(['clients', 'odd1']).agg({
'odd2': [np.sum, np.average]
}).reset_index('clients').reset_index('odd1')
>> grpd
odd1 clients odd2
sum average
0 1 A 13 6.5
1 2 A 8 8.0
2 1 B 9 9.0
3 2 B 10 10.0
我想创建一个数据透视表,如下所示:
I would like to create a pivot table as below:
| odd1 | odd1 | ...... | odd1 |
------------------------------------|---------|
clients| average | average | ..... | average |
所需的输出是:
clients | 1 2
--------|------------------
A | 6.5 8.0
B | 9.0 10.0
如果我们的列不是多层的,这将起作用:
This would work had we a column which is not multilevel:
grpd.pivot(index='clients', columns='odd1', values='odd2')
不确定我是否了解多层cols的工作原理.
Not sure I understand how multilevel cols work.
推荐答案
import numpy as np
import pandas as pd
df = pd.DataFrame({
'clients': pd.Series(['A', 'A', 'A', 'B', 'B']),
'odd1': pd.Series([1, 1, 2, 1, 2]),
'odd2': pd.Series([6, 7, 8, 9, 10])})
aggd = df.groupby(['clients', 'odd1']).agg({
'odd2': [np.sum, np.average]})
print(aggd.unstack(['odd1']).loc[:, ('odd2','average')])
收益
odd1 1 2
clients
A 6.5 8
B 9.0 10
说明:grpd
中的中间步骤之一是
Explanation: One of the intermediate steps in grpd
is
aggd = df.groupby(['clients', 'odd1']).agg({
'odd2': [np.sum, np.average]})
看起来像这样:
In [52]: aggd
Out[52]:
odd2
sum average
clients odd1
A 1 13 6.5
2 8 8.0
B 1 9 9.0
2 10 10.0
aggd
与所需结果之间的视觉比较
Visual comparison between aggd
and the desired result
odd1 1 2
clients
A 6.5 8
B 9.0 10
显示odd1
索引需要成为列索引.该操作-将索引标签移动到列标签-是 unstack
方法.因此很自然地将aggd
堆叠:
shows that the odd1
index needs to become a column index. That operation -- the moving of index labels to column labels -- is the job done by the unstack
method. So it is natural to unstack aggd
:
In [53]: aggd.unstack(['odd1'])
Out[53]:
odd2
sum average
odd1 1 2 1 2
clients
A 13 8 6.5 8
B 9 10 9.0 10
现在可以很容易地看到我们只想选择平均列.可以使用loc
:
Now it is easy to see we just want to select the average columns. That can be done with loc
:
In [54]: aggd.unstack(['odd1']).loc[:, ('odd2','average')]
Out[54]:
odd1 1 2
clients
A 6.5 8
B 9.0 10
这篇关于带有多级列的数据透视表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!