pandas 将级别的每个类别的所有但中间值替换为空白 [英] Pandas Replace All But Middle Values per Category of a Level with Blank

查看:132
本文介绍了 pandas 将级别的每个类别的所有但中间值替换为空白的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给出以下数据透视表:

df=pd.DataFrame({'A':['a','a','a','a','a','b','b','b','b'],
                 'B':['x','y','z','x','y','z','x','y','z'],
                 'C':['a','b','a','b','a','b','a','b','a'],
                 'D':[7,5,3,4,1,6,5,3,1]})
table = pd.pivot_table(df, index=['A', 'B','C'],aggfunc='sum')
table

            D
A   B   C   
a   x   a   7
        b   4
    y   a   1
        b   5
    z   a   3
b   x   a   5
    y   b   3
    z   a   1
        b   6

我知道我可以访问每个级别的值

I know that I can access the values of each level like so:

In [128]:    
table.index.get_level_values('B')

Out[128]:
Index(['x', 'x', 'y', 'y', 'z', 'x', 'y', 'z', 'z'], dtype='object', name='B')

In [129]:
table.index.get_level_values('A')

Out[129]:
Index(['a', 'a', 'a', 'a', 'a', 'b', 'b', 'b', 'b'], dtype='object', name='A')

接下来,除了中间值或n/2 + 1值,我想用空白('')替换每个外部层中的所有值.

Next, I'd like to replace all values in each of the outer levels with blank ('') save for the middle or n/2+1 values.

因此:

Index(['x', 'x', 'y', 'y', 'z', 'x', 'y', 'z', 'z'], dtype='object', name='B')

成为:

Index(['x', '', 'y', '', 'z', 'x', 'y', 'z', ''], dtype='object', name='B')

Index(['a', 'a', 'a', 'a', 'a', 'b', 'b', 'b', 'b'], dtype='object', name='A')

成为:

Index(['', '', 'a', '', '', '', 'b', '', ''], dtype='object', name='A')

最终,我将尝试在Matplotlib单杠中将它们用作y轴的第二级和第三级标签,像这样的图表(尽管我的某些标签可能会向上移动):

Ultimately, I will attempt to use these as secondary and tertiary y-axis labels in a Matplotlib horizontal bar, something chart like this (though some of my labels may be shifted up):

推荐答案

最后花点时间解决了这个问题...

Finally took the time to figure this out...

#First, get the values of the index level.
A=table.index.get_level_values(0)

#Next, convert the values to a data frame.
ndf = pd.DataFrame({'A2':A.values})

#Next, get the count of rows per group.
ndf['A2Count']=ndf.groupby('A2')['A2'].transform(lambda x: x.count())

#Next, get the position based on the logic in the question.
ndf['A2Pos']=ndf['A2Count'].apply(lambda x: x/2 if x%2==0 else (x+1)/2)

#Next, order the rows per group.
ndf['A2GpOrdr']=ndf.groupby('A2').cumcount()+1

#And finally, create the column to use for plotting this level's axis label.
ndf['A2New']=ndf.apply(lambda x: x['A2'] if x['A2GpOrdr']==x['A2Pos'] else "",axis=1)
ndf

    A2  A2Count  A2Pos  A2GpOrdr   A2New
0   a   5        3.0       1    
1   a   5        3.0       2    
2   a   5        3.0       3       a
3   a   5        3.0       4    
4   a   5        3.0       5    
5   b   4        2.0       1    
6   b   4        2.0       2       b
7   b   4        2.0       3    
8   b   4        2.0       4    

这篇关于 pandas 将级别的每个类别的所有但中间值替换为空白的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆