如何使用自定义顺序对DataFrame进行两列排序? [英] How to sort a DataFrame by two columns, using a custom order?
本文介绍了如何使用自定义顺序对DataFrame进行两列排序?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
这两个列都有重复的值。
它或多或少是这样的:
将大熊猫导入为pd
df = pd.DataFrame()
df [0] = pd.Series(['a','aa','c'] * 2)
df [1] = pd.Series([ 2] * 3)
df [2] = pd.Series(range(6))
print(df)
0 1 2
0 a 1 0
1 aa 2 1
2 c 1 2
3 a 2 3
4 aa 1 4
5 c 2 5
现在,我需要按列0和1排序,但不按字母顺序排列:列0应该首先按照顺序:
order = ['a','c','aa']
我该怎么做?
我想像这样排序:
print(sorted_df)
0 1 2
0 a 1 0
1 a 2 3
2 c 1 2
3 c 2 5
4 aa 1 4
5 aa 2 1
使用python 3.5.2,pandas 0.18.1
解决方案
您可以使用大熊猫的分类系列来提供单独排序顺序的功能:
df [0] = df [0] .astype(category)。cat.reorder_categories(order,ordered = True)
print(df.sort_values([0,1]))
0 1 2
0 a 1 0
3 a 2 3
2 c 1 2
5 c 2 5
4 aa 1 4
1 aa 2 1
I have a pandas DataFrame that I need to sort in a particular order in one column, and just ascending in another. Both columns have repeated values. It looks more or less like this:
import pandas as pd
df = pd.DataFrame()
df[0] = pd.Series( [ 'a', 'aa', 'c' ] * 2 )
df[1] = pd.Series( [ 1, 2 ] * 3 )
df[2] = pd.Series( range(6) )
print( df )
0 1 2
0 a 1 0
1 aa 2 1
2 c 1 2
3 a 2 3
4 aa 1 4
5 c 2 5
Now, say that I need to order by columns 0 and 1, but not alphabetically: Column 0 should first follow an order:
order = [ 'a', 'c', 'aa' ]
How do I do that?
I would like to have it sorted like this:
print( sorted_df )
0 1 2
0 a 1 0
1 a 2 3
2 c 1 2
3 c 2 5
4 aa 1 4
5 aa 2 1
Using python 3.5.2, pandas 0.18.1
解决方案
You can use pandas' categorical Series for this purpose which supplies the functionality of an individual sort order:
df[0] = df[0].astype("category").cat.reorder_categories(order, ordered=True)
print(df.sort_values([0, 1]))
0 1 2
0 a 1 0
3 a 2 3
2 c 1 2
5 c 2 5
4 aa 1 4
1 aa 2 1
这篇关于如何使用自定义顺序对DataFrame进行两列排序?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文