Pandas MultiIndex自定义排序级别按分类顺序排列,而不是按字母顺序排列 [英] Pandas MultiIndex custom sort levels by categorical order, not alphabetically

查看:655
本文介绍了Pandas MultiIndex自定义排序级别按分类顺序排列,而不是按字母顺序排列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是Pandas(0.16.1)的新手,并且希望在multiindex中进行自定义排序,因此我使用了分类. 我的多索引的一部分:

I'm new to Pandas (0.16.1), and want custom sort in multiindex so i use Categoricals. Part of my multiindex:

Part  Defect Own
Кузов 504    ИП
Кузов 504    Итого
Кузов 504    ПС
Кузов 505    ПС
Кузов 506    ПС
Кузов 507    ПС
Кузов 530    ИП
Кузов 530    Итого
Кузов 530    ПС

我用MultiIndex级别[Defect,Own]创建数据透视表.然后,我将自己的"分类(请参阅问题的一部分)以将其分类为[ИП,ПС,Итого].但是,当我在级别之前加上部分"时,该部分也是基于缺陷"级别而分类的,并使用

I create pivot table with MultiIndex levels [Defect, Own]. Then i make "Own" Categorical (see p.s. part of question) to sort it as [ИП, ПС, Итого]. But when i prepend levels with "Part", which is also Categorical based on "Defect" level, and sort index with

pvt.sortlevel(0, inplace=True)

自己"级别按字母顺序排序:[ИП,Итого,ПС].如何在多索引中对两个级别进行自定义排序?

"Own" level is sorted in alphabetical order: [ИП, Итого, ПС]. How can i custom-sort two levels in multiindex?

P. S.我使用以下代码将自己的"级别转换为分类":创建新列,将其替换为索引级别.可以吗?

P. S. I convert "Own" level to Categorical with the following code: create new column, replace index level with it. Is it ok?

def makeLevelCategorical(pdf, pname, cats):
    names = pdf.index.names
    namei = names.index(pname)
    pdf["tmp"] = pd.Categorical(pdf.index.get_level_values(pname), categories=cats) #New temp column
    pdf.set_index("tmp", append=True, inplace=True) #Append column to index
    pdf = pdf.reset_index(pname, drop=True) #Remove /pname/ level
    names2 = list(names)
    names2[namei] = "tmp"
    pdf.reorder_levels(names2)  #Put "tmp" level to /pname/'s position
    pdf.index.names = names     #Rename "tmp" level to /pname/
    return pdf

推荐答案

可以使用这是一个小例子:

df = pd.DataFrame(
    {"i1":[1,1,1,1,2,4,4,2,3,3,3,3],
     "i2":[1,3,2,2,1,1,2,2,1,1,3,2],
     "d1":['a','b','c','d','e','f','g','h','i','j','k','l']}
)
df.set_index(['i1', 'i2'], inplace=True)
df.sort_index()

输出:

        d1
i1  i2  
1   1   a
    2   c
    2   d
    3   b
2   1   e
    2   h
3   1   i
    1   j
    2   l
    3   k
4   1   f
    2   g

如果您要更改列的排序顺序,请

If you want to change the sort order on column basis, the Dataframe.sort_index function takes an argument ascending= which can be given a list of [True, False] statements corresponding to the columns in order.

类别是熊猫中一个新的闪亮dtype,应使用它,但此操作本身并不需要.

Categorical is a new shiny dtype in pandas and it should be used, but it is not needed for this operation per se.

由于评论而

排序将始终按字母顺序或相反顺序排序.如果要进行自定义排序,则需要创建一个新列,该列可以按字母顺序排序,但是是可以确定排序的列的结果.使用 Series.map 来执行此操作,就像这样例如,首先用元音对数据集进行排序:

Sort will always sort alphabetically or in reverse order. If you want custom sort, then you need to create a new column which can be sorted alphabetically but is a result of the column which can determine the sorting. Do this using Series.map, like this example, that sorts the datasets with vowels first:

mappings = {'a': 0, 'b':1, 'c':1, 'd':1,
            'e':0, 'f':1, 'g':1, 'h':1,
            'i':0, 'j':1, 'k': 1, 'l': 1}
df['sortby'] = df['d1'].map(mappings)
df.sort('sortby')

        d1  sortby
i1  i2      
1   1   a   0
2   1   e   0
3   1   i   0
1   3   b   1
    2   c   1
    2   d   1
4   1   f   1
    2   g   1
2   2   h   1
3   1   j   1
    3   k   1
    2   l   1

如果您此后不希望使用sortby列,则可以将其删除,如下所示:

If you do not want the sortby column after that, you can simply delete it, like this:

del df['sortby']

这篇关于Pandas MultiIndex自定义排序级别按分类顺序排列,而不是按字母顺序排列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆