Pandas DataFrame按分类列排序,但按特定的类排序 [英] Pandas DataFrame sort by categorical column but by specific class ordering

查看:84
本文介绍了Pandas DataFrame按分类列排序,但按特定的类排序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想通过使用df_selected = df_targets.head(N)在特定列的条目的基础上选择Pandas数据框中的顶部条目.

I would like to select the top entries in a Pandas dataframe base on the entries of a specific column by using df_selected = df_targets.head(N).

每个条目都有一个target值(按重要性顺序):

Each entry has a target value (by order of importance):

Likely Supporter, GOTV, Persuasion, Persuasion+GOTV  

不幸的是

df_targets = df_targets.sort("target")

顺序将是字母顺序(GOTVLikely Supporter,...).

the ordering will be alphabetical (GOTV,Likely Supporter, ...).

我希望像这样的关键字list_ordering:

I was hoping for a keyword like list_ordering as in:

my_list = ["Likely Supporter", "GOTV", "Persuasion", "Persuasion+GOTV"] 
df_targets = df_targets.sort("target", list_ordering=my_list)

为解决此问题,我创建了一个词典:

To deal with this issue I create a dictionary:

dict_targets = OrderedDict()
dict_targets["Likely Supporter"] = "0 Likely Supporter"
dict_targets["GOTV"] = "1 GOTV"
dict_targets["Persuasion"] = "2 Persuasion"
dict_targets["Persuasion+GOTV"] = "3 Persuasion+GOTV"

,但这似乎是一种非Python方法.

, but it seems like a non-pythonic approach.

建议将不胜感激!

推荐答案

我认为您需要 Categorical 和参数ordered=True,然后按

I think you need Categorical with parameter ordered=True and then sorting by sort_values works very nice:

如果检查 Categorical 的文档:

If check documentation of Categorical:

有序分类可以根据类别的自定义顺序进行排序,并且可以具有最小值和最大值.

Ordered Categoricals can be sorted according to the custom order of the categories and can have a min and max value.

import pandas as pd

df = pd.DataFrame({'a': ['GOTV', 'Persuasion', 'Likely Supporter', 
                         'GOTV', 'Persuasion', 'Persuasion+GOTV']})

df.a = pd.Categorical(df.a, 
                      categories=["Likely Supporter","GOTV","Persuasion","Persuasion+GOTV"],
                      ordered=True)

print (df)
                  a
0              GOTV
1        Persuasion
2  Likely Supporter
3              GOTV
4        Persuasion
5   Persuasion+GOTV

print (df.a)
0                GOTV
1          Persuasion
2    Likely Supporter
3                GOTV
4          Persuasion
5     Persuasion+GOTV
Name: a, dtype: category
Categories (4, object): [Likely Supporter < GOTV < Persuasion < Persuasion+GOTV]

df.sort_values('a', inplace=True)
print (df)
                  a
2  Likely Supporter
0              GOTV
3              GOTV
1        Persuasion
4        Persuasion
5   Persuasion+GOTV

这篇关于Pandas DataFrame按分类列排序,但按特定的类排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆