pandas 将级别的每个类别的所有但中间值替换为空白 [英] Pandas Replace All But Middle Values per Category of a Level with Blank
本文介绍了 pandas 将级别的每个类别的所有但中间值替换为空白的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
给出以下数据透视表:
df=pd.DataFrame({'A':['a','a','a','a','a','b','b','b','b'],
'B':['x','y','z','x','y','z','x','y','z'],
'C':['a','b','a','b','a','b','a','b','a'],
'D':[7,5,3,4,1,6,5,3,1]})
table = pd.pivot_table(df, index=['A', 'B','C'],aggfunc='sum')
table
D
A B C
a x a 7
b 4
y a 1
b 5
z a 3
b x a 5
y b 3
z a 1
b 6
I know that I can access the values of each level like so:
In [128]:
table.index.get_level_values('B')
Out[128]:
Index(['x', 'x', 'y', 'y', 'z', 'x', 'y', 'z', 'z'], dtype='object', name='B')
In [129]:
table.index.get_level_values('A')
Out[129]:
Index(['a', 'a', 'a', 'a', 'a', 'b', 'b', 'b', 'b'], dtype='object', name='A')
接下来,除了中间值或n/2 + 1值,我想用空白('')替换每个外部层中的所有值.
Next, I'd like to replace all values in each of the outer levels with blank ('') save for the middle or n/2+1 values.
因此:
Index(['x', 'x', 'y', 'y', 'z', 'x', 'y', 'z', 'z'], dtype='object', name='B')
成为:
Index(['x', '', 'y', '', 'z', 'x', 'y', 'z', ''], dtype='object', name='B')
和
Index(['a', 'a', 'a', 'a', 'a', 'b', 'b', 'b', 'b'], dtype='object', name='A')
成为:
Index(['', '', 'a', '', '', '', 'b', '', ''], dtype='object', name='A')
最终,我将尝试在Matplotlib单杠中将它们用作y轴的第二级和第三级标签,像这样的图表(尽管我的某些标签可能会向上移动):
Ultimately, I will attempt to use these as secondary and tertiary y-axis labels in a Matplotlib horizontal bar, something chart like this (though some of my labels may be shifted up):
推荐答案
最后花点时间解决了这个问题...
Finally took the time to figure this out...
#First, get the values of the index level.
A=table.index.get_level_values(0)
#Next, convert the values to a data frame.
ndf = pd.DataFrame({'A2':A.values})
#Next, get the count of rows per group.
ndf['A2Count']=ndf.groupby('A2')['A2'].transform(lambda x: x.count())
#Next, get the position based on the logic in the question.
ndf['A2Pos']=ndf['A2Count'].apply(lambda x: x/2 if x%2==0 else (x+1)/2)
#Next, order the rows per group.
ndf['A2GpOrdr']=ndf.groupby('A2').cumcount()+1
#And finally, create the column to use for plotting this level's axis label.
ndf['A2New']=ndf.apply(lambda x: x['A2'] if x['A2GpOrdr']==x['A2Pos'] else "",axis=1)
ndf
A2 A2Count A2Pos A2GpOrdr A2New
0 a 5 3.0 1
1 a 5 3.0 2
2 a 5 3.0 3 a
3 a 5 3.0 4
4 a 5 3.0 5
5 b 4 2.0 1
6 b 4 2.0 2 b
7 b 4 2.0 3
8 b 4 2.0 4
这篇关于 pandas 将级别的每个类别的所有但中间值替换为空白的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文