pandas 等级排序 [英] Pandas hierarchical sort

查看:108
本文介绍了 pandas 等级排序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个类别和金额的数据框.可以使用冒号分隔的字符串将类别嵌套到无限级别的子类别中.我希望按降序排序.但是以如图所示的分层类型的方式.

I have a dataframe of categories and amounts. Categories can be nested into sub categories an infinite levels using a colon separated string. I wish to sort it by descending amount. But in hierarchical type fashion like shown.

我如何对其进行排序

CATEGORY                            AMOUNT
Transport                           5000
Transport : Car                     4900
Transport : Train                   100
Household                           1100
Household : Utilities               600
Household : Utilities : Water       400
Household : Utilities : Electric    200
Household : Cleaning                100
Household : Cleaning : Bathroom     75
Household : Cleaning : Kitchen      25
Household : Rent                    400
Living                              250
Living : Other                      150
Living : Food                       100

数据框:

pd.DataFrame({
    "category": ["Transport", "Transport : Car", "Transport : Train", "Household", "Household : Utilities", "Household : Utilities : Water", "Household : Utilities : Electric", "Household : Cleaning", "Household : Cleaning : Bathroom", "Household : Cleaning : Kitchen", "Household : Rent", "Living", "Living : Other", "Living : Food"],
    "amount": [5000, 4900, 100, 1100, 600, 400, 200, 100, 75, 25, 400, 250, 150, 100]
})

注意:这是我想要的顺序.排序之前可以是任意顺序.

Note: this is the order I want it. It may be in any arbitrary order before the sort.

推荐答案

要回答我自己的问题:我找到了一种方法.有点long绕,但在这里.

To answer my own question: I found a way. Kind of long winded but here it is.

import numpy as np
import pandas as pd


def sort_tree_df(df, tree_column, sort_column):
    sort_key = sort_column + '_abs'
    df[sort_key] = df[sort_column].abs()
    df.index = pd.MultiIndex.from_frame(
        df[tree_column].str.split(":").apply(lambda x: [y.strip() for y in x]).apply(pd.Series))
    sort_columns = [df[tree_column].values, df[sort_key].values] + [
        df.groupby(level=list(range(0, x)))[sort_key].transform('max').values
        for x in range(df.index.nlevels - 1, 0, -1)
    ]
    sort_indexes = np.lexsort(sort_columns)
    df_sorted = df.iloc[sort_indexes[::-1]]
    df_sorted.reset_index(drop=True, inplace=True)
    df_sorted.drop(sort_key, axis=1, inplace=True)
    return df_sorted


sort_tree_df(df, 'category', 'amount')

这篇关于 pandas 等级排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆