Pandas 数据透视表索引和列的小计 [英] Subtotals for Pandas pivot table index and column
问题描述
我想为索引 #1(即水果和动物)添加小计行,为列添加小计列(即 2015 和 2016).
I'd like to add subtotal rows for index #1 (ie. Fruits and Animal) and subtotal columns for columns (ie. 2015 and 2016).
对于小计列,我可以做这样的事情,但每年(2015 年和 2016 年)运行这种类型的代码似乎效率低下.有没有更好的办法?我不认为边距"会起作用,因为有多个小计.
For the subtotal columns, I could do something like this, but it seems inefficient to run this type of code for each year (2015 & 2016). Is there a better way? I don't think 'margins' will work because there are multiple subtotals.
df[('2015','2015_Total')] = df[('2015','1st')]+df[('2015','2nd')]
对于小计行(例如,水果总数和动物总数),我不知道从哪里开始.
For the subtotal rows (e.g., fruits total and animal total), I'm not sure where to begin.
推荐答案
很复杂,因为需要在columns
和index中创建
Multiindex
代码>.
It is very complicated, because you need create Multiindex
in columns
and index
.
创建小计很容易 - 使用 groupby
与 sum
.然后创建 Multiindex 并最后 concat
原始 DataFrame
的新列.最后你必须sort_index
(为了正确排序,我在值前添加了 Total_
):
Create subtotals is easy - use groupby
with sum
. Then create Multiindex and last concat
new columns to original DataFrame
. Last you have to sort_index
(I add Total_
before value for correct sorting):
print df
2015_____ 2016_______
1st 2nd 1st 2nd
Fruits Apple 10 9 11 10
Banana 20 22 21 20
Animal Lion 5 3 2 1
Tiger 2 3 5 0
df1 = df.groupby(level=0, axis=1).sum()
print df1
2015_____ 2016_______
Fruits Apple 19 21
Banana 42 41
Animal Lion 8 3
Tiger 5 5
print df.columns.get_level_values(0).to_series().drop_duplicates().tolist()
['2015_____', '2016_______']
#change index to multiindex
new_columns = zip(df.columns.get_level_values(0).to_series().drop_duplicates().tolist(),
"Total_" + df1.columns.str[:4])
print new_columns
[('2015_____', 'Total_2015'), ('2016_______', 'Total_2016')]
df1.columns = pd.MultiIndex.from_tuples(new_columns)
print df1
2015_____ 2016_______
Total_2015 Total_2016
Fruits Apple 19 21
Banana 42 41
Animal Lion 8 3
Tiger 5 5
df = pd.concat([df,df1], axis=1)
df2 = df.groupby(level=0, sort=False).sum()
print df2
2015_____ 2016_______ 2015_____ 2016_______
1st 2nd 1st 2nd Total_2015 Total_2016
Animal 7 6 7 1 13 8
Fruits 30 31 32 30 61 62
print df.index.levels[0][df.columns.labels[0]].to_series().drop_duplicates().tolist()
['Animal', 'Fruits']
#change index to multiindex
new_idx=zip(df.index.levels[0][df.columns.labels[0]].to_series().drop_duplicates().tolist(),
"Total_" + df2.index )
print new_idx
[('Animal', 'Total_Animal'), ('Fruits', 'Total_Fruits')]
df2.index = pd.MultiIndex.from_tuples(new_idx)
print df2
2015_____ 2016_______ 2015_____ 2016_______
1st 2nd 1st 2nd Total_2015 Total_2016
Animal Total_Animal 7 6 7 1 13 8
Fruits Total_Fruits 30 31 32 30 61 62
df = pd.concat([df,df2])
df = df.sort_index(axis=1).sort_index()
print df
2015_____ 2016_______
1st 2nd Total_2015 1st 2nd Total_2016
Animal Lion 5 3 8 2 1 3
Tiger 2 3 5 5 0 5
Total_Animal 7 6 13 7 1 8
Fruits Apple 10 9 19 11 10 21
Banana 20 22 42 21 20 41
Total_Fruits 30 31 61 32 30 62
这篇关于Pandas 数据透视表索引和列的小计的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!