Pandas DataFrame按分类列排序,但按特定的类排序 [英] Pandas DataFrame sort by categorical column but by specific class ordering
问题描述
我想通过使用df_selected = df_targets.head(N)
在特定列的条目的基础上选择Pandas数据框中的顶部条目.
I would like to select the top entries in a Pandas dataframe base on the entries of a specific column by using df_selected = df_targets.head(N)
.
每个条目都有一个target
值(按重要性顺序):
Each entry has a target
value (by order of importance):
Likely Supporter, GOTV, Persuasion, Persuasion+GOTV
不幸的是
df_targets = df_targets.sort("target")
顺序将是字母顺序(GOTV
,Likely Supporter
,...).
the ordering will be alphabetical (GOTV
,Likely Supporter
, ...).
我希望像这样的关键字list_ordering
:
I was hoping for a keyword like list_ordering
as in:
my_list = ["Likely Supporter", "GOTV", "Persuasion", "Persuasion+GOTV"]
df_targets = df_targets.sort("target", list_ordering=my_list)
为解决此问题,我创建了一个词典:
To deal with this issue I create a dictionary:
dict_targets = OrderedDict()
dict_targets["Likely Supporter"] = "0 Likely Supporter"
dict_targets["GOTV"] = "1 GOTV"
dict_targets["Persuasion"] = "2 Persuasion"
dict_targets["Persuasion+GOTV"] = "3 Persuasion+GOTV"
,但这似乎是一种非Python方法.
, but it seems like a non-pythonic approach.
建议将不胜感激!
推荐答案
我认为您需要 Categorical
和参数ordered=True
,然后按
I think you need Categorical
with parameter ordered=True
and then sorting by sort_values
works very nice:
如果检查 Categorical
的文档:
If check documentation of Categorical
:
有序分类可以根据类别的自定义顺序进行排序,并且可以具有最小值和最大值.
Ordered Categoricals can be sorted according to the custom order of the categories and can have a min and max value.
import pandas as pd
df = pd.DataFrame({'a': ['GOTV', 'Persuasion', 'Likely Supporter',
'GOTV', 'Persuasion', 'Persuasion+GOTV']})
df.a = pd.Categorical(df.a,
categories=["Likely Supporter","GOTV","Persuasion","Persuasion+GOTV"],
ordered=True)
print (df)
a
0 GOTV
1 Persuasion
2 Likely Supporter
3 GOTV
4 Persuasion
5 Persuasion+GOTV
print (df.a)
0 GOTV
1 Persuasion
2 Likely Supporter
3 GOTV
4 Persuasion
5 Persuasion+GOTV
Name: a, dtype: category
Categories (4, object): [Likely Supporter < GOTV < Persuasion < Persuasion+GOTV]
df.sort_values('a', inplace=True)
print (df)
a
2 Likely Supporter
0 GOTV
3 GOTV
1 Persuasion
4 Persuasion
5 Persuasion+GOTV
这篇关于Pandas DataFrame按分类列排序,但按特定的类排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!